Guids are evil nasty little creatures that make me cry

time to read 2 min | 305 words

You might have noticed that I don’t like Guids all that much. Guids seems like a great solution when you need to generate an id for something. And then reality intervenes, and you have a non understandable system problem.

Leaving aside the size of the Guid, or the fact that it is not sequential, two pretty major issues with an identifier, the major problem is that it is pretty much opaque for the users.

This was recently thrown in my face again as part of a question in the RavenDB mailing list. Take a look at the following documents. Do you think that those two documents belong to the same category or not?

One of the problems that we discovered was that the user was searching for category 4bf58dd8d48988d1c5941735, and the document had category was 4bf58dd8d48988d14e941735. And it drove everyone crazy about how could it be that this wasn’t working.

Here are those Guids again:

4bf58dd8d48988d1c5941735
4bf58dd8d48988d14e941735

Do you see it? I’m going to be putting some visual space and show you the difference.

Here they are:

4bf58dd8d48988d1c5941735
4bf58dd8d48988d14e941735

And if that isn’t enough for you to despise Guids. Feel free to read them to someone else over the phone, or try to find them in a log file. Especially when you have to deal with several of those dastardly things.

I have a cloud machine dedicated to generating and disposing Guids, I hope that in a few thousands years, I can kill them all.

Tweet Share Share 28 comments

Tags:

raven

Comments

05 Aug 2014
09:29 AM

Scooletz

Unfortunately, you won't kill them: http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-three.aspx

The problem is not Guid itself, but encoding and its presentation. You can choose another one like this http://stackoverflow.com/questions/2827627/what-is-the-most-efficient-way-to-encode-an-arbitrary-guid-into-readable-ascii Beside this, there are other generators, like Snowflake from Twitter which takes half the bits needed for Guid.

05 Aug 2014
09:32 AM

Marcel Popescu

I find it extremely likely that the reason for those GUIDs being so close is that you haven't used standard GUIDs at all, but instead some home-brewed scheme of sequential GUIDs or something like that... you know, to handle the second major issue: "the fact that it is not sequential". I normally search for the first four digits in a GUID and I seldom see collisions.

05 Aug 2014
09:52 AM

Sam

I mean, there happend two things at once here.

First, the uuids are almost equal, which I find pretty unlikely and second, you guys just didn't checked the value. I think it's your fault, but of course it doesn't make life easier...

05 Aug 2014
09:53 AM

markrendle

The thing I like best there is that you copied one of the GUIDs wrong. The screenshot has 4bf58dd8d48988d16d941735 but you've written 4bf58dd8d48988d14e941735. You'd be hard-pushed to find a better example of why they're hideous for any kind of identifier that is supposed to be readable.

They're still excellent for session keys, though.

05 Aug 2014
10:58 AM

Brent Jenkins

If hyphens had been included then it would have been much easier to spot the difference:

4BF58DD8-D489-88D1-4E94-1735 4BF58DD8-D489-88D1-C594-1735

05 Aug 2014
12:37 PM

Thomas Levesque

Sure, similar GUIDs look similar, obviously... but how often do you need to manually compare them? or read them over the phone? When I need to send an ID to a colleague, I use instant messaging for that, not the phone... even for relatively small integer identifiers (7 digits).

05 Aug 2014
13:58 PM

Rik Hemsley

How do you do replication without GUIDs? You need a way to handle ID conflicts, in that case.

How do you create an object and give it an identifier before checking with a central authority whether that identifier is in use? You might want to do this if you're generating a large collection of objects which all need IDs, e.g. in a UI where you're building something new - which you'll then send to a server for persistence. GUIDs solve this. Just generate as you go.

They are clunky, yes, but I think tooling support can fix this. I had a go a long time ago here: http://www.rikkus.info/guids-in-colour

To make this truely useful, it would need to ensure that the colours were very different even when the guids had only one different byte. If anyone would like to have a go at that, please do!

What I'd also like is better tooling support. Currently copying and pasting GUIDs is painful because they're represented as hex strings in most places I see them. They should be first class objects.

05 Aug 2014
14:01 PM

Ayende Rahien

Replication can work just fine without Guids. In RavenDB, we do it just like that. Pretty much the only guids you'll see in RavenDB are the database ids, and that is very rarely used by users.

For generating many ids on the client, you can use hilo.

05 Aug 2014
14:11 PM

Rik Hemsley

"Replication can work just fine without Guids. In RavenDB, we do it just like that" You don't specify what 'that' is... I'd be interested to know. SQL Server seems to demand GUIDs for replication, which is what forced us to use them initially.

HiLo looks like you tell a client which range it's allowed to generate in and it sticks to that. That's fair enough, but I think Guid.NewGuid() is less code ;)

05 Aug 2014
18:34 PM

Ayende Rahien

Rik, Hilo generate human readable stuff. That is _important_.

05 Aug 2014
20:01 PM

Normal person

Why are you killing innocent children?

05 Aug 2014
21:28 PM

Geoff Thornburrow

GUIDs suck for performance. They do however provide protection from a certain class of developer brain-melts: They make it impossible to accidentally use IDs out of context.

Let's say you're like everyone else these days and are building some multi-user online product, with all customers' data in the one database. Let's say a bug is introduced where an Order is queried where OrderID = CustomerID (instead of CustomerID = CustomerID). Using int sequences for the primary key means that it's very likely there is an order with the same ID as a customer, so you've just shown a customer somebody else's order. If both OrderID and CustomerID were GUIDs, there would be no possibility of a collision.

05 Aug 2014
21:30 PM

Ayende Rahien

Geoff, that assumes that you have just integer keys. In RavenDB, that error can't happen, and you have readable keys.

05 Aug 2014
21:40 PM

Geoff Thornburrow

Yeah, I should have said I was talking about DBs in general, eg using identity PKs in SQL Server.

Having a database inherently prevent these problems is awesome.

06 Aug 2014
02:41 AM

João Bragança

Meh. TRWTF here is that they're using a number instead of a string to identify the category. Unless the set of categories is open ended..

06 Aug 2014
05:38 AM

Amin

they say hilo has SPOF issue, what does raven do about it?

06 Aug 2014
07:23 AM

Ayende Rahien

Amin, With RavenDB, you don't have to worry about a single point of failure. You can have a hilo cluster wide, and as long as a single node is up, we can handle that.

06 Aug 2014
09:45 AM

Zuba Lama

Waste A Guid ( http://wasteaguid.info/ )

06 Aug 2014
13:02 PM

Phillip Haydon

@Rik Hemsley

You do not need GUIDs for replication in SQL Server, you can use HiLo to create identifiers, which is the best approach. You should never generate IDs in the database itself. That's the worst thing you can let SQL Server do.

07 Aug 2014
15:37 PM

Kijana Woodard

@Geoff another way to prevent those "brain melts" is to use value objects as identifiers. Even better than guids in that the code won't compile.

Yet another way is to use decent variable names so looking at the code makes the error obvious.

And testing of course... :-D

07 Aug 2014
18:19 PM

Kijana Woodard

Fwiw, writing tests is far simpler and more readable with strings vs guids.

09 Aug 2014
00:10 AM

Kijana Woodard

Since this post, the use of guids has shown itself to be more and more problematic for me:

Try editing a guid in the debugger vs editing a string.*

Try creating/editing records manually in the db. It's possible, but you can't remember guids so you have to keep referring back to other records for reference ids.

*Pro tip: set the variable in the immediate window with System.Guid.Parse("");

09 Aug 2014
02:35 AM

Brianary

Using serial IDs as a determination for order seems like a violation of normalization principals. When order is needed, what's wrong with a "created" datetime field? You must hate git. 😉

09 Aug 2014
02:36 AM

Brianary

Oops, looks like posts aren't enforcing validation.

09 Aug 2014
02:37 AM

Brianary

Never mind, something else is going on, causing duplicate replies, our at least duplicate display of them.

10 Aug 2014
22:17 PM

Richard Tallent

If users are directly searching for GUIDs, that's a UX problem, not a problem with the data type.

NEWSEQUENTIALID() provides a decent solution for overcoming the sequential issue (without the constraints of monotonically-increasing numeric IDs). Another option is to use a COMB, which gives you a timestamp for "free."

128 bits just isn't that big a deal for modern computers to store or compare, so while I use GUIDs heavily as surrogate primary keys, I don't notice a performance issue compared to the old days when I used 32-bit integers.

Granted, using GUIDs by default is just silly. Using them for Category IDs seems like overkill, there can't be that many "categories."

11 Aug 2014
22:22 PM

Kijana Woodard

@Richard, it's not users searching for guids [omg what a nightmare], it's operationally dealing with guids.

And dev debugging and unit testing and ...

31 Aug 2014
20:56 PM

Andrei Rînea

Hate never helped.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB