Development

How Base64 Encoding Works (Step by Step)

By AZ Utils Editorial · · 10 min read

How Base64 Encoding Works (Step by Step)

Base64 can feel like magic — arbitrary bytes go in, tidy printable text comes out, and the equals signs at the end seem mysterious. But the algorithm is simple and elegant, and once you have walked through it by hand, you will understand exactly why the output looks the way it does. This guide explains how Base64 encoding works, step by step, with worked examples for both full and padded groups.

It is written for developers, students and engineers who want to understand the mechanics, not just call a library.

The Core Idea: 3 Bytes Become 4 Characters

Base64's whole design rests on a neat numerical coincidence: 3 bytes is 24 bits, and 24 bits divides evenly into four 6-bit groups. Since 6 bits can hold a value from 0 to 63, each 6-bit group maps perfectly onto one of the 64 characters in the Base64 alphabet.

So the algorithm is: take the input three bytes at a time, line up their 24 bits, split them into four groups of 6 bits, and convert each group into a character. Three bytes of input always produce four characters of output — which is exactly why Base64 is about 33% larger than the original (4 ÷ 3 ≈ 1.33).

In short: Base64 takes 3 input bytes (24 bits), splits them into four 6-bit groups, and maps each group (value 0–63) to a character in its alphabet. Padding handles inputs that are not a multiple of 3 bytes.

The Alphabet

Each 6-bit value (0–63) maps to a character:

Value  0–25  -> A–Z
Value 26–51  -> a–z
Value 52–61  -> 0–9
Value 62     -> +
Value 63     -> /
(padding)    -> =

A Full Worked Example: Encoding "Man"

Let's encode the three characters Man. First, the ASCII codes and their 8-bit binary:

M = 77  = 01001101
a = 97  = 01100001
n = 110 = 01101110

Concatenate the 24 bits, then regroup them into four 6-bit chunks:

8-bit:  01001101 01100001 01101110
6-bit:  010011 010110 000101 101110
value:    19     22      5      46

Now map each value to a character:

19 -> T
22 -> W
5  -> F
46 -> u

So Man encodes to TWFu. Four characters from three bytes, no padding needed because the input was an exact multiple of 3.

How Padding Works

Most inputs are not a multiple of 3 bytes, so the final group is incomplete. Base64 handles this by padding the bits with zeros and the output with = characters.

Two leftover bytes → one "="

Encode Ma (two bytes = 16 bits). Pad the bits up to the next multiple of 6 with zeros, producing three 6-bit groups (only enough data for three characters), and add one = to fill the fourth slot:

M = 01001101  a = 01100001
16 bits:  01001101 01100001
+zeros:   010011 010110 0001(00)
value:      19     22     4
chars:      T      W      E   + "=" padding
result:  TWE=

One leftover byte → two "="

Encode M (one byte = 8 bits). Pad to two 6-bit groups (two characters) and add two =:

M = 01001101
8 bits:  01001101
+zeros:  010011 01(0000)
value:     19    16
chars:     T     Q   + "==" padding
result:  TQ==

So the number of padding characters tells you how many bytes were in the final group: no = means a full 3, one = means 2 bytes, and two == means 1 byte. That is why valid Base64 length is always a multiple of 4.

Decoding: Run It in Reverse

Decoding reverses the process. Each character is converted back to its 6-bit value, the bits are concatenated, and the stream is sliced back into 8-bit bytes. Padding characters are discarded, and the extra zero bits added during encoding are dropped. Because the mapping is exact and lossless, decoding always reproduces the original bytes precisely.

You can watch this happen live — paste a value into our Base64 Encoder/Decoder and toggle between encode and decode.

A Second Worked Example: Encoding "Hi!"

Let's encode another exact 3-byte input, Hi!, to reinforce the pattern:

H = 72  = 01001000
i = 105 = 01101001
! = 33  = 00100001

24 bits: 01001000 01101001 00100001
6-bit:   010010 000110 100100 100001
value:     18     6      36     33
chars:     S      G      k      h

So Hi! becomes SGkh — no padding, because the input was an exact multiple of three bytes. Try encoding the five-byte string Hello yourself: the first three bytes form one full group, and the remaining two bytes form a padded group, giving SGVsbG8= with a single trailing equals sign.

Calculating the Encoded Length

The output length is entirely predictable from the input size:

encodedLength = ceil(inputBytes / 3) * 4

For example, 1 byte → 4 characters (TQ==), 2 bytes → 4 (TWE=), 3 bytes → 4 (TWFu), 4 bytes → 8, 10 bytes → 16. Because the result is always rounded up to a multiple of four, valid Base64 length is always divisible by four — a quick sanity check when you suspect a value was truncated. The ratio of roughly 4 ÷ 3 is also where the famous ~33% size overhead comes from.

Base64url and Alternative Alphabets

The algorithm never changes — only the 64-character lookup table can. The two standard alphabets defined by RFC 4648 differ in just two characters:

ValueStandard Base64Base64url
62+-
63/_
padding=often omitted

Base64url exists because + and / are reserved or meaningful in URLs (a + can be read as a space, a / as a path separator), and = is the query-string assignment character. Swapping those two characters and dropping padding makes the output safe to drop into a URL or filename — which is exactly why JWTs and many opaque tokens use it. To decode Base64url with a standard decoder, you must first translate - back to + and _ back to / (and restore padding if the decoder requires it).

Line Wrapping in MIME

One more wrinkle appears in email. The MIME standard inserts a line break every 76 characters of Base64, so a long attachment is split into many short lines. This was originally to satisfy mail servers that disliked very long lines. The line breaks are not part of the data — decoders ignore whitespace — but if you write your own parser, remember to strip newlines before decoding. Contexts like data URIs and JWTs do not wrap lines, so a single continuous string is expected there.

Doing It in Code

You rarely implement the bit-shuffling yourself; standard libraries are well-tested and fast:

// JavaScript — encode bytes
const bytes = new TextEncoder().encode("Man");
const b64 = btoa(String.fromCharCode(...bytes)); // "TWFu"

# Python — encode bytes
import base64
base64.b64encode(b"Man")        # b'TWFu'
base64.b64encode(b"Ma")         # b'TWE='
base64.b64encode(b"M")          # b'TQ=='

Understanding the algorithm still pays off: it explains the 33% overhead, why output length is always a multiple of 4, and why Base64url simply swaps two alphabet characters without changing any of this logic.

Decoding, Step by Step

To see decoding concretely, let's reverse TWFu back to Man. First, convert each character to its 6-bit alphabet value:

T -> 19 -> 010011
W -> 22 -> 010110
F ->  5 -> 000101
u -> 46 -> 101110

Concatenate the 24 bits and re-slice them into 8-bit bytes:

24 bits: 010011 010110 000101 101110
regroup: 01001101 01100001 01101110
bytes:     77        97       110
chars:     M         a         n

For a padded value like TWE=, the decoder drops the =, converts T W E to bits, and keeps only the whole bytes that the original input contained — discarding the extra zero bits that padding added during encoding. The mapping is exact in both directions, which is what makes Base64 perfectly lossless.

Full Worked Example: Encoding "Hello"

Let's encode all five bytes of Hello from start to finish, combining a full group and a padded group. The byte values are:

H = 72  = 01001000
e = 101 = 01100101
l = 108 = 01101100
l = 108 = 01101100
o = 111 = 01101111

Group 1 — the first three bytes (Hel):

01001000 01100101 01101100
-> 010010 000110 010101 101100
->   18     6      21     44
->   S      G      V      s        => "SGVs"

Group 2 — the last two bytes (lo), padded:

01101100 01101111            (16 bits)
-> 011011 000110 1111(00)    (zero-padded to 18 bits)
->   27     6      60
->   b      G      8     + "=" padding   => "bG8="

Concatenate the two groups and you get SGVsbG8=. The single trailing = tells any decoder that the final group held two bytes, not three — so it knows to keep exactly two bytes when reversing the process. Working through an example like this once removes all the mystery: the padding is just bookkeeping that records how many real bytes were in the last group.

Why 64 Characters?

The choice of 64 is deliberate. The designers wanted a set of characters that virtually every system and character set agrees on and never alters in transit — essentially a "lowest common denominator" of printable ASCII. From that safe pool, 64 is the largest power of two available (the next, 128, would include unsafe and non-printable characters). A power of two matters because it maps cleanly onto a whole number of bits: 64 = 26, so each character carries exactly 6 bits. That 6-bit unit is what lines up so neatly with 3 bytes (24 bits = four 6-bit groups). A smaller alphabet like Base32 (5 bits per character) wastes more space; a larger one would need unsafe characters. Base64 is the sweet spot of efficiency and safety.

Encoding Text vs Raw Binary

It is worth being precise about what Base64 operates on: bytes, always. When people talk about "Base64-encoding a string," there is a hidden first step — the text is converted to bytes using a character encoding, almost always UTF-8. Base64 then encodes those bytes. On the way out, the decoder gives you bytes back, and you interpret them as text using the same character encoding. Skipping or mismatching that text-to-bytes step is the root of most "the accents turned into garbage" bugs. For genuinely binary input — an image, a key, compressed data — there is no character-encoding step at all; the bytes go straight into the algorithm.

Common Mistakes

  1. Assuming padding is optional. Some decoders require it; stripping = without telling the decoder causes failures.
  2. Confusing bits and bytes. Base64 groups by 6 bits for output but the input is 8-bit bytes — mixing these up leads to wrong hand calculations.
  3. Expecting a fixed output size. Output length depends on input length: ceil(n ÷ 3) × 4 characters.
  4. Hand-rolling an encoder. It is easy to get edge cases wrong; use the standard library.

Putting the Algorithm to Work

Understanding the mechanics pays off in everyday debugging, even though you will almost never implement the bit-shuffling yourself. Once you know that three bytes always become four characters, several real-world behaviours stop being surprising. You immediately understand why a Base64 value's length is always a multiple of four, so a string that is not — perhaps 4n+2 characters — has clearly been truncated somewhere. You know that one or two trailing equals signs are normal padding, not corruption, and that their count tells you whether the final group held two bytes or one. And you understand precisely where the roughly one-third size increase comes from, so you can budget for it when a payload needs to fit in a URL, a QR code or a database column.

The same knowledge demystifies the variants. Base64url is not a different algorithm; it is the identical process with two alphabet characters swapped so the output is safe in URLs and filenames. MIME's habit of wrapping the text every 76 characters is just cosmetic line-splitting that decoders ignore. And the reason text must be converted to bytes before encoding is simply that the algorithm operates on bytes, not characters — so the character set you choose (almost always UTF-8) determines the bytes that go in, and you must use the same character set to turn the decoded bytes back into text.

On the performance side, Base64 is fast: encoding and decoding are linear, branch-light operations that modern libraries handle at high throughput. The cost that actually matters is not CPU but size — every byte you encode grows by a third — which is why the algorithm's elegance does not change the guidance to reserve it for small values and use binary transfers for large ones.

Best Practices

  • Use built-in library functions rather than implementing the bit math yourself.
  • Keep padding intact unless a specific format (like JWT) requires it stripped.
  • Remember the overhead — budget for ~33% more size when storing or transmitting Base64.
  • Use Base64url when the output goes into URLs or filenames.

Frequently Asked Questions

How does Base64 encoding work?

Base64 takes input three bytes (24 bits) at a time, splits the bits into four 6-bit groups, and maps each group's value (0–63) to a character in the Base64 alphabet. Inputs that are not a multiple of three bytes are padded with equals signs.

Why does Base64 process 3 bytes at a time?

Because 3 bytes is 24 bits, which divides evenly into four 6-bit groups, and each 6-bit group maps to exactly one of the 64 alphabet characters.

What do the equals signs mean?

They are padding. One equals sign means the final group had 2 bytes; two equals signs mean it had 1 byte. No padding means the input was an exact multiple of 3 bytes.

How do I calculate the encoded length?

The output length is ceil(inputBytes ÷ 3) × 4 characters, which is always a multiple of four.

Is decoding lossless?

Yes. The character-to-bits mapping is exact, so decoding reproduces the original bytes perfectly, with padding and the extra zero bits discarded.

Does Base64url change the algorithm?

No. Base64url uses the same algorithm; it only swaps + for - and / for _ in the alphabet and often omits padding.

Summary

Base64 encoding is a tidy piece of bit manipulation: regroup 24 input bits into four 6-bit values, map each to a character, and pad with equals signs when the input does not divide evenly by three. That single idea explains the 33% size overhead, the multiple-of-four output length, and the meaning of the padding. Decoding simply runs it in reverse, losslessly. Walk through "Man" → "TWFu" once by hand and the encoding will never look mysterious again.

The deeper value of understanding the algorithm is that it turns Base64 from a black box into something predictable. You can glance at a string and reason about it — estimate its decoded size, spot a truncation from a bad length, explain a stray equals sign, or recognise a URL-safe variant by its - and _ characters. None of that requires implementing an encoder; it just requires knowing that three bytes become four characters, six bits at a time, with padding to round out the final group. Keep that one sentence in mind and every Base64 value you encounter becomes readable at a glance.

👉 See Base64 encoding in action →

AZ Utils Editorial

AZ Utils Editorial

Finance & web-tools writer

AZ Utilis writes practical, plain-English guides on calculators, finance and everyday web tools, drawing on years of experience helping beginners and small businesses get the numbers right.

Development

How to Format JSON (Beautify & Minify)

How to format JSON — beautify it for readability or minify it for production — in tools, editors, the command line and code, with the why behind each.

AZ Utils Editorial · · 10 min read