Guids are evil nasty little creatures that make me cry
You might have noticed that I don’t like Guids all that much. Guids seems like a great solution when you need to generate an id for something. And then reality intervenes, and you have a non understandable system problem.
Leaving aside the size of the Guid, or the fact that it is not sequential, two pretty major issues with an identifier, the major problem is that it is pretty much opaque for the users.
This was recently thrown in my face again as part of a question in the RavenDB mailing list. Take a look at the following documents. Do you think that those two documents belong to the same category or not?
One of the problems that we discovered was that the user was searching for category 4bf58dd8d48988d1c5941735, and the document had category was 4bf58dd8d48988d14e941735. And it drove everyone crazy about how could it be that this wasn’t working.
Here are those Guids again:
- 4bf58dd8d48988d1c5941735
- 4bf58dd8d48988d14e941735
Do you see it? I’m going to be putting some visual space and show you the difference.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Here they are:
- 4bf58dd8d48988d1c5941735
- 4bf58dd8d48988d14e941735
And if that isn’t enough for you to despise Guids. Feel free to read them to someone else over the phone, or try to find them in a log file. Especially when you have to deal with several of those dastardly things.
I have a cloud machine dedicated to generating and disposing Guids, I hope that in a few thousands years, I can kill them all.
Comments
Unfortunately, you won't kill them: http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-three.aspx
The problem is not Guid itself, but encoding and its presentation. You can choose another one like this http://stackoverflow.com/questions/2827627/what-is-the-most-efficient-way-to-encode-an-arbitrary-guid-into-readable-ascii Beside this, there are other generators, like Snowflake from Twitter which takes half the bits needed for Guid.
I find it extremely likely that the reason for those GUIDs being so close is that you haven't used standard GUIDs at all, but instead some home-brewed scheme of sequential GUIDs or something like that... you know, to handle the second major issue: "the fact that it is not sequential". I normally search for the first four digits in a GUID and I seldom see collisions.
I mean, there happend two things at once here.
First, the uuids are almost equal, which I find pretty unlikely and second, you guys just didn't checked the value. I think it's your fault, but of course it doesn't make life easier...
The thing I like best there is that you copied one of the GUIDs wrong. The screenshot has 4bf58dd8d48988d16d941735 but you've written 4bf58dd8d48988d14e941735. You'd be hard-pushed to find a better example of why they're hideous for any kind of identifier that is supposed to be readable.
They're still excellent for session keys, though.
If hyphens had been included then it would have been much easier to spot the difference:
4BF58DD8-D489-88D1-4E94-1735 4BF58DD8-D489-88D1-C594-1735
Sure, similar GUIDs look similar, obviously... but how often do you need to manually compare them? or read them over the phone? When I need to send an ID to a colleague, I use instant messaging for that, not the phone... even for relatively small integer identifiers (7 digits).
How do you do replication without GUIDs? You need a way to handle ID conflicts, in that case.
How do you create an object and give it an identifier before checking with a central authority whether that identifier is in use? You might want to do this if you're generating a large collection of objects which all need IDs, e.g. in a UI where you're building something new - which you'll then send to a server for persistence. GUIDs solve this. Just generate as you go.
They are clunky, yes, but I think tooling support can fix this. I had a go a long time ago here: http://www.rikkus.info/guids-in-colour
To make this truely useful, it would need to ensure that the colours were very different even when the guids had only one different byte. If anyone would like to have a go at that, please do!
What I'd also like is better tooling support. Currently copying and pasting GUIDs is painful because they're represented as hex strings in most places I see them. They should be first class objects.
Replication can work just fine without Guids. In RavenDB, we do it just like that. Pretty much the only guids you'll see in RavenDB are the database ids, and that is very rarely used by users.
For generating many ids on the client, you can use hilo.
"Replication can work just fine without Guids. In RavenDB, we do it just like that" You don't specify what 'that' is... I'd be interested to know. SQL Server seems to demand GUIDs for replication, which is what forced us to use them initially.
HiLo looks like you tell a client which range it's allowed to generate in and it sticks to that. That's fair enough, but I think Guid.NewGuid() is less code ;)
Rik, Hilo generate human readable stuff. That is _important_.
Why are you killing innocent children?
GUIDs suck for performance. They do however provide protection from a certain class of developer brain-melts: They make it impossible to accidentally use IDs out of context.
Let's say you're like everyone else these days and are building some multi-user online product, with all customers' data in the one database. Let's say a bug is introduced where an Order is queried where OrderID = CustomerID (instead of CustomerID = CustomerID). Using int sequences for the primary key means that it's very likely there is an order with the same ID as a customer, so you've just shown a customer somebody else's order. If both OrderID and CustomerID were GUIDs, there would be no possibility of a collision.
Geoff, that assumes that you have just integer keys. In RavenDB, that error can't happen, and you have readable keys.
Yeah, I should have said I was talking about DBs in general, eg using identity PKs in SQL Server.
Having a database inherently prevent these problems is awesome.
Meh. TRWTF here is that they're using a number instead of a string to identify the category. Unless the set of categories is open ended..
they say hilo has SPOF issue, what does raven do about it?
Amin, With RavenDB, you don't have to worry about a single point of failure. You can have a hilo cluster wide, and as long as a single node is up, we can handle that.
Waste A Guid ( http://wasteaguid.info/ )
@Rik Hemsley
You do not need GUIDs for replication in SQL Server, you can use HiLo to create identifiers, which is the best approach. You should never generate IDs in the database itself. That's the worst thing you can let SQL Server do.
@Geoff another way to prevent those "brain melts" is to use value objects as identifiers. Even better than guids in that the code won't compile.
Yet another way is to use decent variable names so looking at the code makes the error obvious.
And testing of course... :-D
Fwiw, writing tests is far simpler and more readable with strings vs guids.
Since this post, the use of guids has shown itself to be more and more problematic for me:
Try editing a guid in the debugger vs editing a string.*
Try creating/editing records manually in the db. It's possible, but you can't remember guids so you have to keep referring back to other records for reference ids.
*Pro tip: set the variable in the immediate window with System.Guid.Parse("");
Using serial IDs as a determination for order seems like a violation of normalization principals. When order is needed, what's wrong with a "created" datetime field? You must hate git. 😉
Oops, looks like posts aren't enforcing validation.
Never mind, something else is going on, causing duplicate replies, our at least duplicate display of them.
If users are directly searching for GUIDs, that's a UX problem, not a problem with the data type.
NEWSEQUENTIALID() provides a decent solution for overcoming the sequential issue (without the constraints of monotonically-increasing numeric IDs). Another option is to use a COMB, which gives you a timestamp for "free."
128 bits just isn't that big a deal for modern computers to store or compare, so while I use GUIDs heavily as surrogate primary keys, I don't notice a performance issue compared to the old days when I used 32-bit integers.
Granted, using GUIDs by default is just silly. Using them for Category IDs seems like overkill, there can't be that many "categories."
@Richard, it's not users searching for guids [omg what a nightmare], it's operationally dealing with guids.
And dev debugging and unit testing and ...
Hate never helped.
Comment preview