What Is Base64? A Developer’s Guide to Encoding
By AZ Utils Editorial · · 11 min read
If you have ever peeked at an email's raw source, a data URI in a stylesheet, or the middle section of a JWT, you have seen long runs of seemingly random letters and digits ending in one or two equals signs. That is Base64 — one of the most widely used encodings on the internet. This guide explains what Base64 is, why it exists, how to use it in code, and the misconceptions that get developers into trouble.
It is written for developers and engineers who want a precise mental model, students learning encodings, and technical beginners who keep running into Base64 and want to finally understand it.
What Is Base64?
Base64 is a binary-to-text encoding. It takes arbitrary binary data — an image, a file, encrypted bytes, anything — and represents it using only 64 "safe" printable ASCII characters. The result is text that can travel safely through systems that were designed for text and might mangle or choke on raw binary.
The 64-character alphabet is fixed: the uppercase letters A–Z, the lowercase letters a–z, the digits 0–9, and two symbols, + and /. A 65th character, =, is used only for padding at the end. That alphabet is where the name comes from: it is a numbering system with a base (radix) of 64.
Index: 0..25 -> A..Z
26..51 -> a..z
52..61 -> 0..9
62 -> +
63 -> /
padding -> =
Here is a tiny example. The text Man encodes to TWFu; the text Hello encodes to SGVsbG8= (note the single padding character). The output is always plain, printable text.
In short: Base64 is a binary-to-text encoding that represents any binary data using 64 printable ASCII characters, so the data can be safely transmitted or embedded in text-based systems.
Why Base64 Exists
Many of the internet's foundational protocols are text-based. Email (SMTP), for example, was historically designed to carry 7-bit ASCII text, not raw 8-bit binary. If you tried to push the bytes of a JPEG straight through such a channel, control characters and high bytes could be stripped, altered or interpreted as commands, corrupting the data.
Base64 solves this by re-expressing binary using only characters every text system agrees on. The trade-off is size: because it represents data with a restricted alphabet, Base64 output is about 33% larger than the original binary. In exchange, you get data that survives transmission through text-only channels intact. That bargain — a bit more size for guaranteed safe transport — is why Base64 is everywhere.
Standard Base64 and Base64url
There are two common variants of the alphabet:
- Standard Base64 uses
+and/. This is what you see in email, PEM keys and data URIs. - Base64url (URL- and filename-safe) replaces
+with-and/with_, because+and/have special meanings in URLs. Padding is also often omitted. This is what JWTs use.
The two are otherwise identical. If you decode Base64url with a standard decoder without converting the characters back, you will get errors — a common gotcha when working with tokens.
Encoding and Decoding in Code
Most languages have Base64 built in. The key thing to get right is character encoding: Base64 works on bytes, so text must first be turned into bytes (almost always UTF-8) before encoding.
JavaScript
Browsers provide btoa (binary-to-ASCII, i.e. encode) and atob (ASCII-to-binary, i.e. decode). But these operate on "binary strings" and do not handle Unicode directly, so you must convert through UTF-8:
// Encode Unicode text safely
const text = "Héllo, 世界";
const bytes = new TextEncoder().encode(text); // UTF-8 bytes
const b64 = btoa(String.fromCharCode(...bytes));
console.log(b64);
// Decode back to text
const decodedBytes = Uint8Array.from(atob(b64), c => c.charCodeAt(0));
const decoded = new TextDecoder().decode(decodedBytes);
console.log(decoded); // "Héllo, 世界"
Calling btoa("世界") directly throws an error — forgetting the UTF-8 step is the single most common JavaScript Base64 bug.
Python
Python's base64 module works on bytes and is straightforward:
import base64
text = "Héllo, 世界"
b64 = base64.b64encode(text.encode("utf-8")).decode("ascii")
print(b64)
decoded = base64.b64decode(b64).decode("utf-8")
print(decoded) # "Héllo, 世界"
# URL-safe variant for tokens
url_b64 = base64.urlsafe_b64encode(text.encode("utf-8")).decode("ascii")
The easiest way to encode or decode a one-off value without writing code is our Base64 Encoder/Decoder — it runs in your browser, so the data never leaves your device.
Practical Examples
- A data URI:
data:image/png;base64,iVBORw0KGgo...embeds an image directly in HTML or CSS. - A JWT: its three dot-separated parts are Base64url-encoded JSON (header and payload) plus a signature.
- HTTP Basic Auth: the header value is
Basicfollowed by Base64 ofusername:password. - Embedding binary in JSON: since JSON has no binary type, small blobs are sent as Base64 strings.
We cover these in depth in Base64 Use Cases, and the step-by-step algorithm in How Base64 Encoding Works.
How Much Overhead Does Base64 Add?
Because every 3 bytes of input become 4 output characters, Base64 output is 4 ÷ 3 ≈ 1.333 times the size of the original — about a 33% increase. A 9 KB file becomes roughly 12 KB of Base64 text; a 3 MB image becomes about 4 MB.
There is sometimes a little extra on top. The MIME standard (used in email) wraps Base64 into lines of 76 characters, adding a newline every 76 characters, which contributes a small additional overhead. Other contexts, like data URIs and JWTs, do not wrap lines.
You can estimate the encoded length precisely with the formula ceil(n ÷ 3) × 4, where n is the number of input bytes. This always yields a multiple of four, which is why valid Base64 length is divisible by four. The practical takeaway: Base64 is cheap for small values but a poor fit for large binaries, where the extra third of size — and the loss of streaming and caching — actually matters.
Base64 Across Languages
Base64 is part of the standard library almost everywhere, so you rarely depend on a third-party package. The function names differ, but the concept — encode bytes, decode to bytes — is identical:
// Java
String b64 = Base64.getEncoder().encodeToString(data);
byte[] back = Base64.getDecoder().decode(b64);
// Go
import "encoding/base64"
s := base64.StdEncoding.EncodeToString(data)
b, _ := base64.StdEncoding.DecodeString(s)
// PHP
$b64 = base64_encode($data);
$back = base64_decode($b64);
// C#
string b64 = Convert.ToBase64String(data);
byte[] back = Convert.FromBase64String(b64);
Each of these also offers a URL-safe variant (for example Base64.getUrlEncoder() in Java or base64.URLEncoding in Go). The one constant across all of them is that you encode bytes — so text must be converted to a known encoding, almost always UTF-8, first.
Base64, Base32 and Hexadecimal: The Encoding Family
Base64 is one of several binary-to-text encodings, and choosing between them is about balancing size against readability:
- Hexadecimal (Base16) uses 16 characters (0–9, a–f), two per byte. It doubles the size but is extremely readable byte-by-byte, which is why it dominates hashes and colour codes. See Base64 vs Hex for a full comparison.
- Base32 uses 32 characters and is case-insensitive, trading more size overhead for human-friendliness (used in some OTP secrets and identifiers).
- Base64 is the most space-efficient of the three at ~33% overhead, which is why it wins for transport and embedding.
- URL/percent-encoding is different in kind — it only escapes unsafe characters in text, rather than representing arbitrary binary.
If size is your priority, Base64 is the usual choice; if per-byte readability matters more, hex wins.
Base64 vs URL Encoding: A Common Confusion
Developers often lump Base64 together with URL encoding (also called percent-encoding), but they solve different problems. URL encoding takes text and escapes only the characters that are unsafe in a URL, replacing each with a % followed by its hex code — for example a space becomes %20. Most characters pass through untouched, so the output still looks largely like the original.
Base64, by contrast, takes arbitrary binary and re-expresses all of it using its 64-character alphabet, producing output that looks nothing like the input. URL encoding cannot safely carry raw binary; Base64 can. The two are sometimes even combined: a value is Base64url-encoded to turn binary into safe text, and that text may then be URL-encoded if it is placed somewhere with further restrictions. In short, reach for URL encoding to make text safe inside a URL, and Base64 to make binary safe inside a text channel.
Base64 and Large or Streaming Data
Base64 is designed around fixed 3-byte groups, which makes it stream-friendly in principle — you can encode a large input in 3-byte chunks without holding it all in memory. Many libraries expose streaming encoders and decoders for exactly this reason. However, the 33% size penalty still applies to every byte, so even when streaming is possible, Base64 is rarely the right choice for genuinely large data. A 1 GB file becomes roughly 1.33 GB of text, with no compression benefit and extra CPU spent encoding and decoding. For large transfers, send the raw bytes over a binary-capable channel (a normal HTTP body, a multipart upload, or object storage) and keep Base64 for the small values — tokens, keys, thumbnails, config blobs — where its convenience clearly outweighs the overhead.
A Brief History of Base64
Base64 was not invented in one place; it grew out of the need to send binary through early text-only email. Its direct ancestors are Privacy-Enhanced Mail (PEM, RFC 1421, 1993), which used a Base64 alphabet to encode encrypted email, and MIME (RFC 2045, 1996), which standardised Base64 as a content-transfer encoding for attachments. For years, slightly different conventions floated around, which caused interoperability headaches.
In 2006, RFC 4648 tidied everything up by formally defining the family of related encodings — Base16 (hex), Base32 and Base64 — along with the URL- and filename-safe Base64url alphabet. Today, when a library or specification says "Base64," it almost always means the RFC 4648 definition. That shared standard is why a value encoded in Python can be decoded in Java or a browser without surprises.
How to Recognise a Base64 String
Spotting Base64 by eye is a useful skill when debugging. The tell-tale signs are:
- It uses only the Base64 alphabet — letters, digits, and
+ /(or- _for Base64url). - Its length is a multiple of four (for the padded, standard form).
- It may end in one or two
=padding characters. - It contains no spaces (unless it has been MIME-wrapped into 76-character lines).
A rough validation regex for standard, padded Base64 is:
^[A-Za-z0-9+/]+={0,2}$ // and length % 4 === 0
But be careful: these are heuristics, not proof. Plenty of ordinary strings — even some English words — happen to consist only of Base64 characters, so matching the pattern does not guarantee a string is "really" Base64 or that it decodes to anything meaningful. The only definitive test is to decode it and check that the result is the data you expected. The quickest way to do that is to paste it into the Base64 Encoder/Decoder and read the output.
Common Mistakes
- Thinking Base64 is encryption. It is not. Anyone can decode Base64 instantly — there is no key and no secrecy. Never use it to "hide" passwords or secrets. (See Base64 Security Considerations.)
- Forgetting UTF-8. Encoding text without first converting to UTF-8 bytes corrupts non-ASCII characters or throws errors.
- Mixing up Base64 and Base64url. Decoding a token's URL-safe Base64 with a standard decoder fails on the
-and_characters. - Mishandling padding. Stripping or adding
=incorrectly produces invalid input for strict decoders. - Using it for large files. The 33% size increase makes Base64 a poor choice for big assets where a binary transfer is possible.
When Should You Use Base64?
With the mechanics and trade-offs in mind, the decision of whether to reach for Base64 usually comes down to one question: are you moving binary data through a channel that only handles text? If the answer is yes, Base64 is almost certainly the right tool. Embedding a small icon in a stylesheet, attaching a file to an email, carrying a token in a URL, putting a thumbnail inside a JSON response, pasting a certificate into a config file — these all share that shape, and Base64 fits each of them naturally.
The decision flips when either half of that question changes. If the data is large, the 33% overhead and loss of caching and streaming efficiency mean you should prefer a real binary transfer. And if the channel can already carry binary — a normal HTTP body, a file upload, an object store — then encoding to text adds cost for no benefit. The most common mistake is using Base64 out of habit when one of these conditions has quietly changed, for instance encoding a multi-megabyte image into a data URI simply because that is how the small icons were handled.
There is also a category where Base64 looks tempting but is the wrong tool entirely: anything to do with secrecy. Because it is reversible by anyone, Base64 must never stand in for encryption, hashing or access control. Used within its proper lane — safe text transport of small binary values — it is simple, universal and dependable. Used outside it, it is either wasteful or actively misleading.
Best Practices
- Encode bytes, not assumptions. Convert text to UTF-8 first, every time.
- Pick the right variant. Use Base64url for anything that travels in URLs, filenames or tokens.
- Never treat Base64 as a security measure. Always pair sensitive data with real encryption and TLS.
- Reserve it for small payloads. Embed small icons or blobs; transfer large files as binary.
- Validate decoded data. Decoding succeeds on many malformed inputs; check that the result is what you expect before using it.
Frequently Asked Questions
What is Base64 used for?
Base64 is used to represent binary data as text so it can be safely transmitted or embedded in text-based systems — for example in email attachments, data URIs, JWTs, HTTP Basic Auth and binary values inside JSON.
Is Base64 encryption?
No. Base64 is an encoding, not encryption. It has no key and provides no secrecy — anyone can decode it instantly. Never use it to protect sensitive data.
Why does Base64 end in equals signs?
The equals signs are padding. Base64 processes data in 3-byte groups; when the final group has only one or two bytes, one or two = characters pad the output to a multiple of four.
Why is Base64 larger than the original data?
Base64 represents every 3 bytes of input with 4 output characters, so the encoded data is about 33% larger than the original binary.
What is the difference between Base64 and Base64url?
Base64url is a URL- and filename-safe variant that replaces + with - and / with _, and often omits padding. It is used where the standard + and / characters would cause problems, such as in URLs and JWTs.
How do I encode Unicode text in Base64?
Convert the text to UTF-8 bytes first, then Base64-encode those bytes. In JavaScript use TextEncoder; in Python encode the string with utf-8 before calling b64encode.
Summary
Base64 is a simple but essential tool: a binary-to-text encoding that lets arbitrary data ride safely through text-based systems, at the cost of roughly a third more size. It uses a fixed 64-character alphabet (with a URL-safe variant), pads with equals signs, and is built into virtually every language. The one rule to never forget is that Base64 is encoding, not encryption — it hides nothing. Get the UTF-8 handling right, choose the correct variant, and reach for a formatter or the encoder/decoder tool when you need to inspect a value, and Base64 becomes second nature.
👉 Encode or decode Base64 with our free tool →
Related Resources
- Base64 Encoder/Decoder — encode and decode in your browser
- How Base64 Encoding Works — the algorithm step by step
- Base64 Use Cases — where it is used in practice
- Base64 Security Considerations — why it is not encryption
- Base64 vs Hex — encoding comparison