PyMOTW: base64

By Doug Hellmann
July 20, 2008

The base64 module contains functions for translating binary data into a subset of ASCII suitable for transmission using plaintext protocols.

Module: base64
Purpose: Encode binary data into ASCII characters.
Python Version: 1.4 and later

Description:

The base64, base32, and base16 encodings convert 8 bit bytes to values with 6, 5, or 4 bits of useful data per byte, allowing non-ASCII bytes to be encoded as ASCII characters for transmission over protocols that require plain ASCII, such as SMTP. The "base" values correspond to the length of the alphabet used in each encoding. There are also URL-safe variations of the original encodings that use slightly different results.

Base 64 Encoding:

A basic example of encoding some text looks like this:

import base64

initial_data = open(__file__, 'rt').read()

encoded_data = base64.b64encode(initial_data)

num_initial = len(initial_data)
padding = { 0:0, 1:2, 2:1 }[num_initial % 3]

print '%d bytes before encoding' % num_initial
print 'Expect %d padding bytes' % padding
print '%d bytes after encoding' % len(encoded_data)
print
print encoded_data


The output shows the 529 bytes of the original source expand to 708 bytes after being encoded. The encoding process looks at each sequence of 24 bits in the input (3 bytes) and encodes those same 24 bits spread over 4 bytes in the output. The last two characters, the "==", are padding because the number of bits in the original string was not evenly divisible by 24.

There are no carriage returns in the output produced by the library, but I have inserted them in the output here to make this page render more cleanly in HTML.


$ python base64_b64encode.py
529 bytes before encoding
Expect 2 padding bytes
708 bytes after encoding

IyEvdXNyL2Jpbi9lbnYgcHl0aG9uCiMgZW5jb2Rpbmc
6IHV0Zi04CiMKIyBDb3B5cmlnaHQgKGMpIDIwMDggRG
91ZyBIZWxsbWFubiBBbGwgcmlnaHRzIHJlc2VydmVkL
gojCiIiIgoiIiIKCl9fdmVyc2lvbl9fID0gIiRJZDog
cHltb3R3LnB5IDEyMzkgMjAwOC0wMS0xNiAxMDo1NTo
xOVogZGhlbGxtYW5uICQiCgppbXBvcnQgYmFzZTY0Cg
ppbml0aWFsX2RhdGEgPSBvcGVuKF9fZmlsZV9fLCAnc
nQnKS5yZWFkKCkKCmVuY29kZWRfZGF0YSA9IGJhc2U2
NC5iNjRlbmNvZGUoaW5pdGlhbF9kYXRhKQoKbnVtX2l
uaXRpYWwgPSBsZW4oaW5pdGlhbF9kYXRhKQpwYWRkaW
5nID0geyAwOjAsIDE6MiwgMjoxIH1bbnVtX2luaXRpY
WwgJSAzXQoKcHJpbnQgJyVkIGJ5dGVzIGJlZm9yZSBl
bmNvZGluZycgJSBudW1faW5pdGlhbApwcmludCAnRXh
wZWN0ICVkIHBhZGRpbmcgYnl0ZXMnICUgcGFkZGluZw
pwcmludCAnJWQgYnl0ZXMgYWZ0ZXIgZW5jb2RpbmcnI
CUgbGVuKGVuY29kZWRfZGF0YSkKcHJpbnQKcHJpbnQg
ZW5jb2RlZF9kYXRhCg==


Base 64 Decoding:

The encoded string can be converted back to the original form by taking 4 bytes and converting them to the original 3, using a reverse lookup. The b64decode() function does that for you.

import base64

original_string = 'This is the data, in the clear.'
print 'Original:', original_string

encoded_string = base64.b64encode(original_string)
print 'Encoded :', encoded_string

decoded_string = base64.b64decode(encoded_string)
print 'Decoded :', decoded_string



$ python base64_b64decode.py
Original: This is the data, in the clear.
Encoded : VGhpcyBpcyB0aGUgZGF0YSwgaW4gdGhlIGNsZWFyLg==
Decoded : This is the data, in the clear.


URL-safe Variations:

Because the default base64 alphabet may use + and /, and those two characters are used in URLs, it became necessary to specify an alternate encoding with substitutes for those characters. The + is replaced with a -, and / is replaced with underscore (_). Otherwise, the alphabet is the same.

import base64

for original in [ '\xfb\xef', '\xff\xff' ]:
print 'Original :', repr(original)
print 'Standard encoding:', base64.standard_b64encode(original)
print 'URL-safe encoding:', base64.urlsafe_b64encode(original)
print



$ python base64_urlsafe.py
Original : '\xfb\xef'
Standard encoding: ++8=
URL-safe encoding: --8=

Original : '\xff\xff'
Standard encoding: //8=
URL-safe encoding: __8=


Other Encodings:

Besides base 64, the module provides functions for working with base 32 and base 16 (hex) encoded data.

import base64

original_string = 'This is the data, in the clear.'
print 'Original:', original_string

encoded_string = base64.b32encode(original_string)
print 'Encoded :', encoded_string

decoded_string = base64.b32decode(encoded_string)
print 'Decoded :', decoded_string



$ python base64_base32.py
Original: This is the data, in the clear.
Encoded : KRUGS4ZANFZSA5DIMUQGIYLUMEWCA2LOEB2GQZJAMNWGKYLSFY======
Decoded : This is the data, in the clear.


The base 16 functions work with the hexadecimal alphabet.

import base64

original_string = 'This is the data, in the clear.'
print 'Original:', original_string

encoded_string = base64.b16encode(original_string)
print 'Encoded :', encoded_string

decoded_string = base64.b16decode(encoded_string)
print 'Decoded :', decoded_string



$ python base64_base16.py
Original: This is the data, in the clear.
Encoded : 546869732069732074686520646174612C20696E2074686520636C6561722E
Decoded : This is the data, in the clear.


References:

RFC 3548 - The Base16, Base32, and Base64 Data Encodings
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,



You might also be interested in:


Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?