What is Base64?
Base64 is an encoding scheme that represents arbitrary binary data using a 64-character ASCII alphabet: A–Z, a–z, 0–9, plus two symbols (+ and / in standard Base64, or - and _ in the URL-safe variant). A trailing = character is used for padding. Because every byte of output is a printable ASCII character, Base64-encoded data can pass through text-only transport channels that would otherwise corrupt binary payloads.
Base64 is not encryption. Anyone can decode a Base64 string with no secret required. The purpose is to change the representation of data, not to hide its content.
Why does Base64 exist?
A lot of internet infrastructure predates the universal support of 8-bit binary data in transport. Email, for example, was originally designed to carry 7-bit ASCII. Sending a JPEG through SMTP without encoding would see some bytes silently dropped, others rewritten, and the attachment would arrive corrupted. Base64 solves that by re-encoding those bytes into a safe subset of characters that every system agrees on.
Modern systems still run into the same class of problem:
- URLs — certain characters need percent-escaping, and binary bytes are not valid at all.
- HTTP headers — must be ASCII; authentication tokens and credentials typically get Base64-encoded.
- JSON and XML — these formats cannot carry raw binary, so images, certificates, or keys are embedded as Base64 strings.
- Configuration files — Kubernetes Secrets, for example, store their
datafields as Base64.
How the algorithm works
Base64 takes input three bytes at a time — 24 bits — and splits those 24 bits into four 6-bit groups. Each 6-bit group indexes into the 64-character alphabet, giving four output characters per three input bytes. The size overhead is therefore 4/3, or roughly 33%.
When the input length is not a multiple of three, Base64 pads the final group with zero bits and appends = characters to signal how many bytes were padding. One = means the final group had two bytes of real data; two = means it had one.
For example, encoding the ASCII string Man:
Input: M a n
Binary: 01001101 01100001 01101110
Groups: 010011 010110 000101 101110
Index: 19 22 5 46
Output: T W F u
Final: TWFu
No padding needed because the input was exactly three bytes.
Unicode and UTF-8
Base64 works on bytes, not characters. If you want to Base64-encode a string that contains non-ASCII characters, you need to decide on a byte representation first — in practice, almost always UTF-8. Every modern Base64 tool (including our Base64 encoder) UTF-8-encodes the input string before running the Base64 algorithm so round-trips through emojis, accented letters, and CJK characters work cleanly.
Skipping the UTF-8 step is a classic bug: naively Base64-encoding a JavaScript string with btoa will throw on any character outside Latin-1. The fix is to UTF-8-encode first (for example, btoa(unescape(encodeURIComponent(s)))) and the mirror image on decode.
Base64 vs Base64URL
Standard Base64 uses + and / in its alphabet. Both of those characters have special meaning inside URLs, filenames, and some header formats — + can be interpreted as a space, / is a path separator. To avoid escaping, a variant known as Base64URL swaps them for - and _ respectively, and usually drops the trailing = padding.
If a string you are trying to decode contains - or _, or if it has no padding but its length is not a multiple of four, it is probably Base64URL. Convert the alphabet back and add the missing = characters, then decode with a standard Base64 tool.
JSON Web Tokens (JWTs) are a common example — each of the three dot-separated sections is Base64URL-encoded JSON. You can decode the header and payload with any Base64URL-aware tool to inspect the claims, though the signature section only makes sense in conjunction with the signing key.
When you should use Base64
Reach for Base64 when you need to move opaque bytes through a text-only channel and the ~33% size increase is acceptable. Typical fits:
- Embedding a small image inline in an HTML or CSS document via a
data:URI. - Generating an HTTP Basic Authentication header (
Authorization: Basic <base64>). - Storing a PEM-encoded certificate or private key inside a JSON or YAML document.
- Including a binary blob in a message queue payload whose format is JSON.
- Encoding a short binary identifier for inclusion in a URL.
When you should NOT use Base64
- To hide data. Base64 offers zero confidentiality. Use real encryption if you need secrecy.
- For large files. The 33% overhead matters at scale. Prefer a binary transport (multipart uploads, binary WebSocket frames, gRPC) when you can.
- For IDs that appear in analytics or URLs and need to be stable. Base64URL encodes each byte the same way every time, but different variants (with vs without padding) can produce different strings for the same bytes. Pick one variant and stick with it.
- As a checksum. Base64 changes encoding, not content — two semantically identical payloads can have different Base64 representations if they differ in whitespace or byte order.
Common pitfalls
Even a simple encoding scheme has a few reliable ways to fail.
Missing padding. If you strip = characters to save a few bytes in a URL, be sure the decoder you use at the other end accepts unpadded input.
Whitespace in the encoded string. Some tools wrap Base64 output at 76 characters per line (an old MIME convention). Most decoders tolerate this, but some strict parsers do not. When in doubt, collapse whitespace before decoding.
Wrong alphabet. Pasting a Base64URL string into a standard Base64 decoder (or vice versa) will usually fail partway through with an invalid-character error. Check for - or _ first.
Double-encoding. Base64-encoding an already-Base64 string is legal but rarely what anyone intends. If the decoded output still looks like random ASCII with lots of = at the end, decode once more.
Try it yourself
Our free Base64 Encoder/Decoder runs entirely in your browser — nothing you paste leaves your device. It handles Unicode automatically and works for anything from a single word up to large configuration files.
Summary
Base64 is the default answer whenever you need to carry binary data through a text channel. It is simple, ubiquitous, and well-supported across every language and framework. It is also not encryption, adds ~33% overhead, and has a few dialects (notably Base64URL) that trip people up when mixed. Understand what it does and what it does not do, and you will reach for it confidently — and stop reaching for it when a real binary transport would serve you better.