Ayende @ Rahien

Refunds available at head office

Abolishing guids

This seems like a minor thing, but it was raised during the design phase of SilverQueues/AgQueues/LuminQ (can someone spot the jokes?).

How does a client identify itself to the server? Consider the fact that there are likely to be clients popping up all the time, so we can’t pre-assign them with meaningful names. The suggestion was brought up to use GUIDs to identify the client. That has the benefit of simplicity. It is easy to write, easy to implement and easy to understand from a conceptual level.

It got shot it down quickly, because while it is easy, I have never met the GUID that I could honestly say I have seen before. Recognizing clients by GUIDs is going to make it much harder to work with the system, however, because GUIDs are so opaque.

Instead of doing that, we will probably go with “clients/329392” or something similar, because that one is human readable. In the end, if you can make it easier to work with, it pays off, big time.

Comments

Jason
08/16/2010 09:48 AM by
Jason

The human-readable approach has a security risk: clients can now claim to be someone else. Non-predictable GUIDs have the advantage that one client cannot really get at the data intended for another client. Perhaps using both would be the best scenario. The int would be used to identify the client, the GUID would ensure they are who they say they are.

Ayende Rahien
08/16/2010 09:51 AM by
Ayende Rahien

Jason,

Guids as a security measure is another application of security through obscurity.

It is pretty easy to sniff them on the network, after all.

Jason
08/16/2010 10:21 AM by
Jason

Agreed; SSL/TLS mitigates this to some extent.

Richard Dingwall
08/16/2010 10:59 AM by
Richard Dingwall

Difficulty in reading/recognizing GUIDs is one of the reasons I like using hilo ID generator in NHibernate.

Harry M
08/16/2010 11:38 AM by
Harry M

I keep thinking it would be fun to make a natural language key generator - by mixing up adjectives, verbs, nouns tenses and stuff. Obviously would only work for small sets, or have really long names

e.g. angrybluepanther500, oddnaturalsoap123

James Arendt
08/16/2010 11:46 AM by
James Arendt

Have you considered taking a GUID, converting it to bytes then base-32 encode the bytes as a string? Base-32 would result in a shorter string than the hex-based GUID format while at the same time only including characters that are human-readable. Another note about Base-32 is that it omits characters that could be confused with other characters when reading. For example, I and 1. It also excludes characters (ex. U) that could likely create obscene, albeit English, words.

Chris Marisic
08/16/2010 12:34 PM by
Chris Marisic

@James are you aware of a clean guid -> base 32 encoding algorithm? I haven't ever been able to find one.

I found an encoding changing project on codeproject that would let you specify basically any type of encoding but it suffers from arithmetic overflows with guids. I tried doing some manual splitting of numbers but the results of the project seemed to be non-deterministic then which is not acceptable at all for dealing with identifiers obviously.

@Harry that's an interesting approach, it should be somewhat obvious to program since you would be implementing exactly the hilo algorithm expect for the hi key you concat words together. It might even be better if you split it into 3 keys, word - hi - lo so you don't lose a set of words on each hi creation and instead of a table of all current word combinations and their hi value, and anytime you generate a new unique set of words that its hi value starts at 0 or 1.

Mike
08/16/2010 12:54 PM by
Mike

Generate a GUID and then generate a message digest (hash) from the GUID.

Fero
08/16/2010 01:04 PM by
Fero

Try this one, can generate nice ids. Found on codeproject.

private string RNGCharacterMask()

    {

        int maxSize = 8;

        char[] chars = new char[62];

        string a;

        a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";

        chars = a.ToCharArray();

        int size = maxSize;

        byte[] data = new byte[1];

        RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();

        crypto.GetNonZeroBytes(data);

        size = maxSize;

        data = new byte[size];

        crypto.GetNonZeroBytes(data);

        StringBuilder result = new StringBuilder(size);

        foreach (byte b in data)

        {

            result.Append(chars[b % (chars.Length - 1)]);

        }

        return result.ToString();

    }
Paul Hatcher
08/16/2010 02:04 PM by
Paul Hatcher

One thing to be careful of is that the generator can produce embarrising/obscene words, e.g. your nice new corporate client gets assigned an id of fart123 or worse :-)

Depends on how many you are generating (client was producing >10m values), but you can get around this by drop vowels and a few more characters

tobi
08/16/2010 02:19 PM by
tobi

"ason,

Guids as a security measure is another application of security through obscurity.

It is pretty easy to sniff them on the network, after all."

This goes for many password transmission systems. If the network sniffing attack is feasible is the important question (and most of the time it isn't).

SSL does not prevent clients from faking their id. You could go with 6-8 random base64 chars instead of a guid. Those would have 30-40 bits of security.

Jeremy
08/16/2010 03:30 PM by
Jeremy

@Chris - check out the tpz-base-32 project on github.

I haven't had a chance to really package it up nicely so that others could use it easily but it is there and basically BSD licensed. It uses the z-base-32 encoding which uses a really nice set of characters but which is somewhat under-specified, so check my readme to see how my implementations interpret the spec.

There are two implementations in the project at present, one I call the reference implementation that can handle all sorts of data types, even full-on never-ending streams of bytes. The downside is that it isn't the fastest version. The fastest version handles just 32 bit unsigned integers (unless I have forgotten my own code :) but is ripe for extension with other types since it is super easy to automate tests against the reference implementation. Both implementations have good test coverage.

The integer implementation has also been ported to JavaScript and that too is also on github. This week, I'll be adding a Ruby port.

tobi
08/16/2010 04:35 PM by
tobi

Fero, in your implemention some characters are more likely to appear than others.

I miscalculated the amount of entropy: for every base64 char there are 6 bits of entropy so it is 36-48 for 6 to 8 chars. Humans can remember 5-9 chars in one go for a short time without any training.

Ryan Heath
08/16/2010 05:16 PM by
Ryan Heath

@ spot the joke

Do you mean Ag equals Silver? I do not know what Lumin means.

// Ryan

Tuna Toksoz
08/16/2010 05:53 PM by
Tuna Toksoz

The other one seems like Lumin=>luminescence which means light :)

Jeff
08/17/2010 06:30 PM by
Jeff

Why not take the JMS route and make the client specify the name? JMS uses client name + consumer name to identify clients. Who cares what it is as long the combination is unique.

I don't really understand the security concerns mentioned above if your goal is simply to uniquely identify clients....security is another concern entirely.

Steve Py
08/17/2010 10:37 PM by
Steve Py

It depends on the intended use. It seems that in a document system the client ID would be a meaningful key, in that it's something that might be presented on screen or printed on a form and used to pull up an individual. Guids are definitely a bad fit for that.

In a distributed environment you're not going to get a 100% reliable unique identifier out of a central store unless you lock the store, query+increment an ID and unlock it. The question is, if you want something like a 6-digit number, how to reliably generate it without blocking the server?

Comments have been closed on this topic.