Ayende @ Rahien

It's a girl

Guids are evil nasty little creatures that make me cry

You might have noticed that I don’t like Guids all that much. Guids seems like a great solution when you need to generate an id for something. And then reality intervenes, and you have a non understandable system problem.

Leaving aside the size of the Guid, or the fact that it is not sequential, two pretty major issues with an identifier, the major problem is that it is pretty much opaque for the users.

This was recently thrown in my face again as part of a question in the RavenDB mailing list. Take a look at the following documents. Do you think that those two documents belong to the same category or not?

image

One of the problems that we discovered was that the user was searching for category 4bf58dd8d48988d1c5941735, and the document had category was 4bf58dd8d48988d14e941735. And it drove everyone crazy about how could it be that this wasn’t working.

Here are those Guids again:

  • 4bf58dd8d48988d1c5941735
  • 4bf58dd8d48988d14e941735

Do you see it? I’m going to be putting some visual space and show you the difference.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Here they are:

  • 4bf58dd8d48988d1c5941735
  • 4bf58dd8d48988d14e941735

And if that isn’t enough for you to despise Guids. Feel free to read them to someone else over the phone, or try to find them in a log file. Especially when you have to deal with several of those dastardly things.

I have a cloud machine dedicated to generating and disposing Guids, I hope that in a few thousands years, I can kill them all.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Scooletz
08/05/2014 09:29 AM by
Scooletz

Unfortunately, you won't kill them: http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-three.aspx

The problem is not Guid itself, but encoding and its presentation. You can choose another one like this http://stackoverflow.com/questions/2827627/what-is-the-most-efficient-way-to-encode-an-arbitrary-guid-into-readable-ascii Beside this, there are other generators, like Snowflake from Twitter which takes half the bits needed for Guid.

Marcel Popescu
08/05/2014 09:32 AM by
Marcel Popescu

I find it extremely likely that the reason for those GUIDs being so close is that you haven't used standard GUIDs at all, but instead some home-brewed scheme of sequential GUIDs or something like that... you know, to handle the second major issue: "the fact that it is not sequential". I normally search for the first four digits in a GUID and I seldom see collisions.

Sam
08/05/2014 09:52 AM by
Sam

I mean, there happend two things at once here.

First, the uuids are almost equal, which I find pretty unlikely and second, you guys just didn't checked the value. I think it's your fault, but of course it doesn't make life easier...

markrendle
08/05/2014 09:53 AM by
markrendle

The thing I like best there is that you copied one of the GUIDs wrong. The screenshot has 4bf58dd8d48988d16d941735 but you've written 4bf58dd8d48988d14e941735. You'd be hard-pushed to find a better example of why they're hideous for any kind of identifier that is supposed to be readable.

They're still excellent for session keys, though.

Brent Jenkins
08/05/2014 10:58 AM by
Brent Jenkins

If hyphens had been included then it would have been much easier to spot the difference:

4BF58DD8-D489-88D1-4E94-1735 4BF58DD8-D489-88D1-C594-1735

Thomas Levesque
08/05/2014 12:37 PM by
Thomas Levesque

Sure, similar GUIDs look similar, obviously... but how often do you need to manually compare them? or read them over the phone? When I need to send an ID to a colleague, I use instant messaging for that, not the phone... even for relatively small integer identifiers (7 digits).

Rik Hemsley
08/05/2014 01:58 PM by
Rik Hemsley

How do you do replication without GUIDs? You need a way to handle ID conflicts, in that case.

How do you create an object and give it an identifier before checking with a central authority whether that identifier is in use? You might want to do this if you're generating a large collection of objects which all need IDs, e.g. in a UI where you're building something new - which you'll then send to a server for persistence. GUIDs solve this. Just generate as you go.

They are clunky, yes, but I think tooling support can fix this. I had a go a long time ago here: http://www.rikkus.info/guids-in-colour

To make this truely useful, it would need to ensure that the colours were very different even when the guids had only one different byte. If anyone would like to have a go at that, please do!

What I'd also like is better tooling support. Currently copying and pasting GUIDs is painful because they're represented as hex strings in most places I see them. They should be first class objects.

Ayende Rahien
08/05/2014 02:01 PM by
Ayende Rahien

Replication can work just fine without Guids. In RavenDB, we do it just like that. Pretty much the only guids you'll see in RavenDB are the database ids, and that is very rarely used by users.

For generating many ids on the client, you can use hilo.

Rik Hemsley
08/05/2014 02:11 PM by
Rik Hemsley

"Replication can work just fine without Guids. In RavenDB, we do it just like that" You don't specify what 'that' is... I'd be interested to know. SQL Server seems to demand GUIDs for replication, which is what forced us to use them initially.

HiLo looks like you tell a client which range it's allowed to generate in and it sticks to that. That's fair enough, but I think Guid.NewGuid() is less code ;)

Ayende Rahien
08/05/2014 06:34 PM by
Ayende Rahien

Rik, Hilo generate human readable stuff. That is important.

Normal person
08/05/2014 08:01 PM by
Normal person

Why are you killing innocent children?

Geoff Thornburrow
08/05/2014 09:28 PM by
Geoff Thornburrow

GUIDs suck for performance. They do however provide protection from a certain class of developer brain-melts: They make it impossible to accidentally use IDs out of context.

Let's say you're like everyone else these days and are building some multi-user online product, with all customers' data in the one database. Let's say a bug is introduced where an Order is queried where OrderID = CustomerID (instead of CustomerID = CustomerID). Using int sequences for the primary key means that it's very likely there is an order with the same ID as a customer, so you've just shown a customer somebody else's order. If both OrderID and CustomerID were GUIDs, there would be no possibility of a collision.

Ayende Rahien
08/05/2014 09:30 PM by
Ayende Rahien

Geoff, that assumes that you have just integer keys. In RavenDB, that error can't happen, and you have readable keys.

Geoff Thornburrow
08/05/2014 09:40 PM by
Geoff Thornburrow

Yeah, I should have said I was talking about DBs in general, eg using identity PKs in SQL Server.

Having a database inherently prevent these problems is awesome.

João Bragança
08/06/2014 02:41 AM by
João Bragança

Meh. TRWTF here is that they're using a number instead of a string to identify the category. Unless the set of categories is open ended..

Amin
08/06/2014 05:38 AM by
Amin

they say hilo has SPOF issue, what does raven do about it?

Ayende Rahien
08/06/2014 07:23 AM by
Ayende Rahien

Amin, With RavenDB, you don't have to worry about a single point of failure. You can have a hilo cluster wide, and as long as a single node is up, we can handle that.

Zuba Lama
08/06/2014 09:45 AM by
Zuba Lama

Waste A Guid ( http://wasteaguid.info/ )

Phillip Haydon
08/06/2014 01:02 PM by
Phillip Haydon

@Rik Hemsley

You do not need GUIDs for replication in SQL Server, you can use HiLo to create identifiers, which is the best approach. You should never generate IDs in the database itself. That's the worst thing you can let SQL Server do.

Kijana Woodard
08/07/2014 03:37 PM by
Kijana Woodard

@Geoff another way to prevent those "brain melts" is to use value objects as identifiers. Even better than guids in that the code won't compile.

Yet another way is to use decent variable names so looking at the code makes the error obvious.

And testing of course... :-D

Kijana Woodard
08/07/2014 06:19 PM by
Kijana Woodard

Fwiw, writing tests is far simpler and more readable with strings vs guids.

Kijana Woodard
08/09/2014 12:10 AM by
Kijana Woodard

Since this post, the use of guids has shown itself to be more and more problematic for me:

Try editing a guid in the debugger vs editing a string.*

Try creating/editing records manually in the db. It's possible, but you can't remember guids so you have to keep referring back to other records for reference ids.

*Pro tip: set the variable in the immediate window with System.Guid.Parse("");

Brianary
08/09/2014 02:35 AM by
Brianary

Using serial IDs as a determination for order seems like a violation of normalization principals. When order is needed, what's wrong with a "created" datetime field? You must hate git. 😉

Brianary
08/09/2014 02:36 AM by
Brianary

Oops, looks like posts aren't enforcing validation.

Brianary
08/09/2014 02:37 AM by
Brianary

Never mind, something else is going on, causing duplicate replies, our at least duplicate display of them.

Richard Tallent
08/10/2014 10:17 PM by
Richard Tallent

If users are directly searching for GUIDs, that's a UX problem, not a problem with the data type.

NEWSEQUENTIALID() provides a decent solution for overcoming the sequential issue (without the constraints of monotonically-increasing numeric IDs). Another option is to use a COMB, which gives you a timestamp for "free."

128 bits just isn't that big a deal for modern computers to store or compare, so while I use GUIDs heavily as surrogate primary keys, I don't notice a performance issue compared to the old days when I used 32-bit integers.

Granted, using GUIDs by default is just silly. Using them for Category IDs seems like overkill, there can't be that many "categories."

Kijana Woodard
08/11/2014 10:22 PM by
Kijana Woodard

@Richard, it's not users searching for guids [omg what a nightmare], it's operationally dealing with guids.

And dev debugging and unit testing and ...

Comments have been closed on this topic.