Don't UUID Yourself

Published on 2021-12-10

For a long time, sequential IDs were good enough for everything. Just use AUTO_INCREMENT or SERIAL, job done, right? But then you realize that your competitor might now figure out how many customers you have, because they just have to create a new user and look at their ID. Or someone might enumerate all of your customers, starting at one and adding one until they are finished.

As an alternative UUID (esp. UUID v4) has become increasingly popular and the reasons are pretty obvious: It’s a 128 bit space, so guessing an ID is much harder, they are not sequential, so you can’t enumerate them. And most importantly, it makes your software look so much more enterprisey.

UUIDs incorporate a trade-off that they are made for machines, but are also human-readable. But I’m not sure that they are always the right choice.

128 bit are just 16 byte, but due to the hex encoding and the separation into groups, they’re in fact 36 characters long. Thus they have a 125% overhead, but that’s not too bad when you consider that they are suitable for human consumption. But about the last point: I find it rather difficult to skim through lists of UUIDs:

UUIDs

They don’t give any indication about the kind of data. When you’re developing or debugging you try to make sense and find patterns, but UUIDs to - at least - my brain are mostly noise.

Also it may be difficult if you have a large collection and start to get UUIDs that look similar. The following IDs might look equal:

c0d72339-f3fd-438f-934f-283821683226
c0d72339-f3f9-4e8f-934f-283821683226

but they’re different. In a heated debugging situation this might turn out to be very annoying.

Could there be an alternative?

If you want to keep human-readability and have a similar address space, an alternative might be random, prefixed alpha-numeric strings. Prefixes help distinguish different kind of identifiers: user IDs could be prefixed with “usr-”, payment IDs with “pay-” and so forth.

A 6 bit per ASCII char alphabet consisting of ABCDEFGHJKLMNPQRSTUVWXYZ23456789 is case insensitive and even eliminates confusing characters like 0 and O as well as 1 and l. To match at least 128 bit of UUID, you’d need 22 chars in that alphabet. Adding a type-specific prefix, with a length of four chars (as in the example above), would result in 26 chars instead of the UUID’s 36. Such a string would be more readable and distinguishable and also have a smaller overhead.

If you’re already using UUIDs, stick with them, they’re nice. But when you start off with something new consider if you want to add nine SLOC to your code:

import "math/rand"

const alphanum = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"

func randomAlphaNumToken(prefix string, length int) string {
	b := make([]byte, length)
	for i := range b {
		b[i] = alphanum[rand.Intn(len(alphanum))]
	}
	return prefix + string(b)
}