PyMOTW: uuid

By Doug Hellmann
July 27, 2008 | Comments: 2

The uuid module implements Universally Unique Identifiers as described in RFC 4122.

Module: uuid
Purpose: Generate unique identifiers for objects.
Python Version: 2.5

Description:

RFC 4122 defines a system for creating universally unique identifiers for resources in a way that does not require a central registrar. UUID values are 128 bits long and "can guarantee uniqueness across space and time". They are useful for ids for documents, hosts, application clients, and other situations where a unique value is necessary. The RFC is specifically geared toward creating a Uniform Resource Name namespace.

Three main algorithms are covered by the spec:

  • Using IEEE 802 MAC addresses as a source of uniqueness
  • Using pseudo-random numbers
  • Using well-known strings combined with cryptographic hashing


In all cases the seed value is combined with the system clock and a clock sequence value (to maintain uniqueness in case the clock was set backwards).

UUID 1 - IEEE 802 MAC Address:

UUID version 1 values are computed using the MAC address of the host. The uuid module uses getnode() to retrieve the MAC value on a given system:

import uuid

print hex(uuid.getnode())



$ python uuid_getnode.py
0x1ec200d9e0L


If a system has more than one network card, and so more than one MAC, any one of the values may be returned.

To generate a UUID for a given host, identified by its MAC address, use the uuid1() function. You can pass a node identifier, or leave the field blank to use the value returned by getnode().

import uuid

u = uuid.uuid1()

print u
print type(u)
print 'bytes :', repr(u.bytes)
print 'hex :', u.hex
print 'int :', u.int
print 'urn :', u.urn
print 'variant :', u.variant
print 'version :', u.version
print 'fields :', u.fields
print '\ttime_low : ', u.time_low
print '\ttime_mid : ', u.time_mid
print '\ttime_hi_version : ', u.time_hi_version
print '\tclock_seq_hi_variant: ', u.clock_seq_hi_variant
print '\tclock_seq_low : ', u.clock_seq_low
print '\tnode : ', u.node
print '\ttime : ', u.time
print '\tclock_seq : ', u.clock_seq


The components of the UUID object returned can be accessed through read-only instance attributes. Some attributes, such as hex, int, and urn, are different representations of the UUID value.


$ python uuid_uuid1.py
c8425fd2-5bec-11dd-b385-001ec200d9e0
<class 'uuid.UUID'>
bytes : '\xc8B_\xd2[\xec\x11\xdd\xb3\x85\x00\x1e\xc2\x00\xd9\xe0'
hex : c8425fd25bec11ddb385001ec200d9e0
int : 266190234244921474865513442896659929568
urn : urn:uuid:c8425fd2-5bec-11dd-b385-001ec200d9e0
variant : specified in RFC 4122
version : 1
fields : (3359793106L, 23532L, 4573L, 179L, 133L, 132103854560L)
time_low : 3359793106
time_mid : 23532
time_hi_version : 4573
clock_seq_hi_variant: 179
clock_seq_low : 133
node : 132103854560
time : 134364636421185490
clock_seq : 13189


Because of the time component, each time uuid1() is called a new value is returned.

import uuid

for i in xrange(3):
print uuid.uuid1()


Notice in this output that only the time component (at the beginning of the string) changes.


$ python uuid_uuid1_repeat.py
f48683ca-5bec-11dd-9168-001ec200d9e0
f4868a8c-5bec-11dd-9168-001ec200d9e0
f4868c3a-5bec-11dd-9168-001ec200d9e0


Of course, since your computer has a different MAC address than mine, you will see entirely different values if you run the examples, because the node identifier at the end of the UUID will change, too.

import uuid

node1 = uuid.getnode()
print hex(node1), uuid.uuid1(node1)

node2 = 0x1e5274040e
print hex(node2), uuid.uuid1(node2)



$ python uuid_uuid1_othermac.py
0x1ec200d9e0L 79e13e17-5bed-11dd-83e5-001ec200d9e0
0x1e5274040eL 79e1c1b0-5bed-11dd-9e3c-001e5274040e


UUID 3 and 5 - Name-Based Values:

It is also useful in some contexts to create UUID values from names instead of random or time-based values. Versions 3 and 5 of the UUID specification use cryptographic hash values (MD5 or SHA-1) to combine namespace-specific seed values with "names" (DNS hostnames, URLs, object ids, etc.). There are several well-known namespaces, identified by pre-defined UUID values, for working with DNS, URLs, ISO OIDs, and X.500 Distinguished Names. You can also define your own application-specific namespaces by generating and saving UUID values.

To create a UUID from a DNS name, pass uuid.NAMESPACE_DNS as the namespace argument to uuid3() or uuid5():

import uuid

hostnames = ['www.doughellmann.com', 'blog.doughellmann.com']

for name in hostnames:
print name
print '\tMD5 :', uuid.uuid3(uuid.NAMESPACE_DNS, name)
print '\tSHA-1 :', uuid.uuid5(uuid.NAMESPACE_DNS, name)



$ python uuid_uuid3_uuid5.py
www.doughellmann.com
MD5 : bcd02e22-68f0-3046-a512-327cca9def8f
SHA-1 : e3329b12-30b7-57c4-8117-c2cd34a87ce9
blog.doughellmann.com
MD5 : 9bdabfce-dfd6-37ab-8a3f-7f7293bcf111
SHA-1 : fa829736-7ef8-5239-9906-b4775a5abacb


The UUID value for the same name in a namespace is always the same, no matter when or where it is calculated. Values for the same name in different namespaces are different, of course.

import uuid

for i in xrange(3):
print uuid.uuid3(uuid.NAMESPACE_DNS, 'www.doughellmann.com')



$ python uuid_uuid3_repeat.py
bcd02e22-68f0-3046-a512-327cca9def8f
bcd02e22-68f0-3046-a512-327cca9def8f
bcd02e22-68f0-3046-a512-327cca9def8f


UUID 4 - Random Values:

Sometimes host-based and namespace-based UUID values are not "different enough". In cases where you want to use the UUID as a lookup key, a more random sequence of values with more differentiation is desirable. In these situations, use uuid4() to generate UUIDs from random values.

import uuid

for i in xrange(3):
print uuid.uuid4()



$ python uuid_uuid4.py
1b53263a-d6a7-4bb9-8930-c3acf48ba354
a224a8ee-de50-4aa1-8808-76a1b1b1a227
2fb446fa-afe8-4911-aebb-511ec06ad0ef


Working with UUID Objects:

In addition to generating new UUID values, you can parse strings in various formats to create UUID objects. This makes it easier to compare them, sort them, etc.

import uuid

def show(msg, l):
print msg
for v in l:
print '\t', v
print

input_values = [
'urn:uuid:f2f84497-b3bf-493a-bba9-7c68e6def80b',
'{417a5ebb-01f7-4ed5-aeac-3d56cd5037b0}',
'2115773a-5bf1-11dd-ab48-001ec200d9e0',
]

show('input_values', input_values)

uuids = [ uuid.UUID(s) for s in input_values ]
show('converted to uuids', uuids)

uuids.sort()
show('sorted', uuids)



$ python uuid_uuid_objects.py
input_values
urn:uuid:f2f84497-b3bf-493a-bba9-7c68e6def80b
{417a5ebb-01f7-4ed5-aeac-3d56cd5037b0}
2115773a-5bf1-11dd-ab48-001ec200d9e0

converted to uuids
f2f84497-b3bf-493a-bba9-7c68e6def80b
417a5ebb-01f7-4ed5-aeac-3d56cd5037b0
2115773a-5bf1-11dd-ab48-001ec200d9e0

sorted
2115773a-5bf1-11dd-ab48-001ec200d9e0
417a5ebb-01f7-4ed5-aeac-3d56cd5037b0
f2f84497-b3bf-493a-bba9-7c68e6def80b



References:

RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,



You might also be interested in:


2 Comments

"Sometimes host-based and namespace-based UUID values are not "different enough". In cases where you want to use the UUID as a lookup key, a more random sequence of values with more differentiation is desirable. In these situations, use uuid4() to generate UUIDs from random values."

Why?

@Chris - In one case I was generating unique identifiers for objects all on the same host. Using host-based UUIDs meant that most of the ID was the same for each object. By using uuid4(), I was able to expand the range of unique ids so it was easier to find object references in log files. Using random values also provides better performance when the UUIDs are meant as hash keys, since you end up with fewer collisions.

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?