Development

What Is URL Encoding? Percent-Encoding Explained

By AZ Utils Editorial · · 12 min read

What Is URL Encoding? Percent-Encoding Explained

Type a search with a space and an ampersand into any website, then look at the address bar: your tidy query has turned into something like q=cats%20%26%20dogs. Those percent signs and hex digits are URL encoding at work — the quiet mechanism that lets arbitrary text travel safely inside a web address. This guide explains what URL encoding is, why it exists, exactly how it works, and the handful of distinctions that trip developers up.

It is written for developers building web applications and APIs, students learning how the web handles data, and technical beginners who keep seeing these percent codes and want to understand them.

What Is URL Encoding?

URL encoding, also called percent-encoding, is a method for representing characters in a URL that would otherwise be unsafe, reserved, or simply not allowed. It works by replacing such a character with a percent sign % followed by two hexadecimal digits that represent the character's byte value. A space, for instance, becomes %20; an ampersand becomes %26; a question mark becomes %3F.

The reason this is necessary comes down to a basic constraint: URLs are only permitted to contain a limited set of characters. The standard that governs them (RFC 3986) defines a small, safe alphabet, and anything outside it must be encoded. On top of that, certain characters have special meaning inside a URL — the ? that starts a query string, the & that separates parameters, the / that divides path segments — so when you want to include one of those characters as ordinary data rather than as a delimiter, you must encode it so it is not mistaken for its structural role.

In essence, URL encoding is a translation layer. It takes human-meaningful text that may contain spaces, symbols, accented letters or any other character, and converts it into a form that fits within the strict rules of a web address while preserving the original meaning exactly. When the URL reaches its destination, the process is reversed — decoded — to recover the original text. The encoding is completely lossless: whatever goes in comes back out unchanged.

In short: URL encoding (percent-encoding) replaces characters that are unsafe or reserved in a URL with a percent sign followed by two hexadecimal digits representing the character's byte value, so that arbitrary text can be carried safely inside a web address.

Why URL Encoding Exists

To appreciate why URL encoding is needed, consider what a URL actually is: a compact piece of structured text whose punctuation carries meaning. The colon separates the scheme from the rest, slashes divide the path, a question mark introduces the query, ampersands separate the query's key/value pairs, and a hash marks the fragment. The whole address is a tiny grammar, and that grammar only works if the structural characters are unambiguous.

Now imagine a user searches for the phrase "rock & roll." If you dropped the raw ampersand straight into the query string, the part after it would look like the start of a new parameter, and the URL would be misread. The space is equally problematic: spaces are not allowed in URLs at all, and many systems would truncate or mangle the address at the first one. URL encoding resolves both situations by turning the ampersand into %26 and the space into %20, so they are transmitted as data rather than interpreted as structure.

The same need arises with any character outside the safe set — accented letters, emoji, symbols from non-Latin scripts, and so on. The early web was built around a limited ASCII character set, and although the modern web is thoroughly international, URLs themselves remain constrained for compatibility. Percent-encoding bridges that gap: it lets a URL carry the world's characters by expressing each one as a sequence of safe bytes. Without it, the web could not reliably pass search queries, file names, email addresses, or virtually any user-supplied text through links and forms.

Reserved and Unreserved Characters

The URL standard divides characters into groups, and understanding the two most important groups demystifies what gets encoded and what does not.

The unreserved characters are always safe and never need encoding. They are the uppercase and lowercase letters A–Z and a–z, the digits 0–9, and four symbols: hyphen -, underscore _, period . and tilde ~. Any of these can appear literally in a URL with no special treatment, and a good encoder will leave them untouched.

The reserved characters are the ones with structural meaning: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. These are allowed in a URL, but only in their delimiter role. When you want one of them to be part of your data instead of acting as a delimiter, you must percent-encode it. This is the crux of most encoding decisions: a slash that separates path segments stays a slash, but a slash inside a value that happens to contain one must become %2F so it is not read as a path divider.

Everything else — spaces, accented and non-Latin letters, most symbols — falls outside both groups and must be encoded whenever it appears. The mental model to carry is simple: letters, digits and the four unreserved symbols pass through freely; the reserved punctuation passes through only when used structurally; and anything else gets percent-encoded.

How Percent-Encoding Works

The encoding itself is mechanical. To encode a character, you take its byte value and write it as a percent sign followed by the two-digit hexadecimal representation of that byte. A space is byte value 32, which is 20 in hexadecimal, so a space becomes %20. An ampersand is byte 38, hex 26, giving %26. Because hexadecimal pairs run from 00 to FF, a single percent-encoded sequence can represent any byte from 0 to 255.

For characters beyond basic ASCII, the modern rule is to first encode the character as UTF-8 bytes, then percent-encode each of those bytes. The accented letter é, for example, is two bytes in UTF-8 — C3 and A9 — so it becomes %C3%A9. A character that takes three or four UTF-8 bytes, such as many emoji or CJK characters, becomes a correspondingly longer run of percent-encoded pairs. This UTF-8-first approach is what allows URLs to carry any character in the Unicode universe while still using only the safe ASCII alphabet on the wire.

Decoding simply reverses the steps: each %XX sequence is converted back to its byte, the bytes are reassembled, and the UTF-8 is interpreted to recover the original character. To see this happen on any value, paste it into our URL Encoder/Decoder, which encodes and decodes entirely in your browser.

The Space Problem: %20 vs +

One quirk causes endless confusion, so it deserves a clear explanation. A space can be represented two ways depending on context. In a URL path or as the general percent-encoding, a space is %20. But in the specific format used by HTML form submissions — application/x-www-form-urlencoded, which is what most query strings effectively use — a space is traditionally encoded as a plus sign +. So q=cats+and+dogs and q=cats%20and%20dogs can both mean the same query.

The catch is that this makes the plus sign itself ambiguous. If a user's actual data contains a literal + (say, a phone number or a search for "C++"), it must be encoded as %2B, because an unencoded + in a query string is likely to be decoded back into a space. Forgetting this is a classic source of bugs where pluses mysteriously vanish from user input. The safe rule is to always percent-encode your data values fully and let the tooling handle the form conventions, rather than hand-writing pluses and hoping they survive.

URL Encoding in Code

Every language has built-in functions, and the most important thing is to choose the right one for what you are encoding.

JavaScript

// Encode a single component (a value) — encodes reserved characters too
encodeURIComponent("rock & roll?");   // "rock%20%26%20roll%3F"

// Encode a whole URL — leaves :/?#&= intact for structure
encodeURI("https://example.com/a b?x=1&y=2"); // spaces encoded, structure kept

// Decode
decodeURIComponent("rock%20%26%20roll%3F");   // "rock & roll?"

Python

from urllib.parse import quote, quote_plus, unquote

quote("rock & roll?")        # "rock%20%26%20roll%3F"   (space -> %20)
quote_plus("rock & roll?")   # "rock+%26+roll%3F"       (space -> +, for forms)
unquote("rock%20%26%20roll%3F")  # "rock & roll?"

The recurring decision — encode a single value versus a whole URL — is the subject of How to Encode Special Characters in URLs, and the parallel but different world of HTML encoding is covered in URL Encoding vs HTML Encoding.

Try Our Free URL Encoder/Decoder

The fastest way to encode or decode a value, or to check why a URL is misbehaving, is our URL Encoder/Decoder.

  • ✅ Encode text for safe use in URLs, and decode percent-encoded values
  • ✅ Handles spaces, reserved characters and Unicode
  • ✅ Runs in your browser — nothing is uploaded

👉 Encode or decode a URL now →

A Mental Model That Makes It Click

If you want a single way to think about URL encoding that resolves most confusion, picture the URL as a sentence with very strict grammar, where a few punctuation marks have fixed jobs. The question mark always means "the query starts here," the ampersand always means "next parameter," the slash always means "next path segment." These marks are the grammar, and the grammar must remain readable for the URL to work. Your data, meanwhile, is the words you want to put into that sentence — and some of your words happen to contain the same characters the grammar uses.

URL encoding is the mechanism that lets you use those characters as words without the reader mistaking them for grammar. When you percent-encode an ampersand inside a value, you are telling the URL parser "this ampersand is part of my data, not a parameter separator." It is the same idea as escaping a quotation mark inside a quoted string in a programming language: you need a way to include the special character literally without ending the structure it normally delimits. Once you see URL encoding as escaping data so it cannot be confused with structure, every decision about what to encode becomes intuitive — you encode any character in your data that could otherwise be read as part of the URL's grammar, plus anything not allowed in a URL at all.

This mental model also explains why the same character is sometimes encoded and sometimes not. A slash that you intend as a path separator stays a literal slash, because there you want it to act as grammar. A slash inside a value you are inserting must be encoded, because there you want it to be data. The character is identical; what differs is the role you intend it to play, and encoding is how you make that intention unambiguous to the parser.

Where URL Encoding Is Used

URL encoding is everywhere once you start looking. Search forms encode the user's query before placing it in the address. Links that carry an email address, a file path or any text with spaces or symbols rely on it. APIs encode query parameters and path segments so values containing slashes or ampersands do not break the request. Redirect URLs are encoded when passed as a parameter inside another URL. Analytics and tracking links encode campaign data. Even OAuth and other authentication protocols specify exact encoding rules so that signatures computed on both ends match. In all of these, the job is identical: carry data through the URL without letting it collide with the URL's own structure.

Common Mistakes

  1. Confusing encodeURI and encodeURIComponent. Using the wrong one either over-encodes a whole URL or under-encodes a value, leaving reserved characters that break the structure.
  2. Forgetting that + means space in query strings. A literal plus in data must be %2B, or it will be decoded as a space.
  3. Double-encoding. Encoding an already-encoded value turns %20 into %2520, producing visibly broken text after one decode.
  4. Not encoding non-ASCII as UTF-8. Using a different character set produces mojibake on the other end.
  5. Encoding the entire URL when you meant a single value, which mangles the legitimate delimiters.

Best Practices

  • Encode individual components, not whole URLs, using a component encoder for each value you insert.
  • Build URLs with proper tools — a URL builder or query-string API — rather than concatenating strings by hand.
  • Always use UTF-8 as the character set before percent-encoding.
  • Encode once, decode once. Be deliberate about where in your pipeline encoding happens to avoid double-encoding.
  • Treat decoded input as untrusted and validate it, just as with any user data.

Frequently Asked Questions

What is URL encoding?

URL encoding, or percent-encoding, replaces characters that are unsafe or reserved in a URL with a percent sign followed by two hexadecimal digits representing the character's byte value, so arbitrary text can be carried safely in a web address.

Why is a space shown as %20 or +?

A space is %20 in general percent-encoding and in URL paths, but in the form-urlencoded format used by many query strings it is traditionally a plus sign. Because of this, a literal plus in data must be encoded as %2B.

Which characters need to be URL encoded?

Anything outside the unreserved set (letters, digits, and - _ . ~) should be encoded when used as data. Reserved characters like / ? # & = must be encoded when they are data rather than acting as delimiters.

How are non-English characters URL encoded?

They are first converted to UTF-8 bytes, and each byte is then percent-encoded. The letter é, for example, is the two bytes C3 and A9, so it becomes %C3%A9.

What is the difference between encodeURI and encodeURIComponent?

encodeURIComponent encodes a single value, including reserved characters, and is right for query-string values. encodeURI encodes a whole URL and leaves structural characters like : / ? # & = intact.

What is double-encoding?

Double-encoding is encoding an already-encoded value, which turns %20 into %2520. It produces broken text after a single decode and usually means encoding was applied twice in a pipeline.

Summary

URL encoding is the web's way of carrying any text safely inside the strict grammar of a URL. By replacing unsafe and reserved characters with percent-encoded byte values — and by encoding non-ASCII characters as UTF-8 first — it lets spaces, symbols and the world's scripts travel through links, forms and APIs without colliding with the URL's structural punctuation. The concepts to keep straight are the unreserved-versus-reserved distinction, the %20-versus-+ space quirk, and the choice between encoding a single component and a whole URL. Get those right, lean on built-in functions and URL builders rather than hand-assembly, and keep a converter close for quick checks, and URL encoding becomes a dependable, invisible part of everything you build for the web.

👉 Encode and decode URLs with our free tool →

AZ Utils Editorial

AZ Utils Editorial

Finance & web-tools writer

AZ Utilis writes practical, plain-English guides on calculators, finance and everyday web tools, drawing on years of experience helping beginners and small businesses get the numbers right.

Development

How to Format JSON (Beautify & Minify)

How to format JSON — beautify it for readability or minify it for production — in tools, editors, the command line and code, with the why behind each.

AZ Utils Editorial · · 10 min read