Ayende @ Rahien

Refunds available at head office

Modeling reference data in RavenDB

The question came up in the Mailing list, and I thought it would be a good thing to post about.

How do you handle reference data in RavenDB? The typical example in most applications would be something like the list of states, their names and abbreviations. You might want to refer to a state by its abbreviation, but still allow for something like this:

How do you handle this in RavenDB? Using a relational database, you would probably have a separate table just for states, and it is certainly possible to create something similar to that in RavenDB:

image

The problem with this approach is that is it a very wrong headed approach one for a document database. In a relational database, you have no choice but to threat anything that has many items as a table with many rows. In a document database, we have much better alternatives.

Instead of threating each state as an individual document, we can treat them as a whole value, like this:

image

In the sample, I included just a couple of states, but I think that you get the idea. Note the name of this document “Config/States”.

What are the benefits of this approach?

  • We only have to retrieve a single document.
  • We are almost always going to treat the states as a list, never as individual items.
  • There is never a transactional boundary for modifying just a single state.

In fact, I would usually say that this isn’t good enough. I would try to centralized any/all of the reference data that we use in the application into a single document. There is a high likelihood that it would be a fairly small document even after doing so, and it would be insanely easy to cache.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Rafal
08/31/2011 09:08 AM by
Rafal

This is a nice approach but has a drawback: sometimes you need to modify your dictionaries or let your customer modify them. In case of sql database it's a simple insert/update, but in Raven you'll have to edit Json document that can get huge and unmaintainable if there are many dictionaries. Of course you could provide a CRUD gui for that purpose but it's an additional burden for develpoers.

Matt Warren
08/31/2011 09:22 AM by
Matt Warren

@Rafal

With the Raven Client API it's not that hard, you certainly don't have to edit the Json directly. You just do something like this:

var stateLookup = session.Load<List("Config/states"); ... make changes.. session.Store(stateLookup)

bavo
08/31/2011 09:36 AM by
bavo

How would you handle localization of those values?

Matt Warren
08/31/2011 09:48 AM by
Matt Warren

@braco

I guess one way is to have several docs, "Config/en-gb/States", "Config/fr-ca/States", "Config/en-ca/States" etc

Rafal
08/31/2011 10:27 AM by
Rafal

@Matt Great, you have shown that Raven has API. And? Am I supposed to tell my customer that they can write a simple program to update a list? Almost everyone in IT can use SQL to modify data and it's quite common practice, but with Raven it's not so easy. Especially if all static data is in a huge Json document.

Frank Quednau
08/31/2011 10:38 AM by
Frank Quednau

Please don't threat all those poor many-item'ed tables and states, they are but artifacts of sick minds.

Daniel Lang
08/31/2011 10:50 AM by
Daniel Lang

Rafal,

As a serious developers you wouldn't want your customers to insert and update rows in a SQL-table manually but instead have a simple UI for that. In a RDBMS situation it is even more critical since you have to maintain forey-key-relations and I would doubt that someone who gives his customers database-access can assure that he has all the key-constraints set-up correctly in SQL-server...

From the developer perspective, the only drawback I find here is the work needed to maintain the relation, but because the names of states aren't likely to change it doesn't matter here.

Giedrius
08/31/2011 10:57 AM by
Giedrius

not familiar with ravendb, so my question may look dumb, but I'm wondering how mailing list entity looks like (reference to state) and how you load state for each of mailing list item, what displaying grid of mailing list items.

Nieve
08/31/2011 12:17 PM by
Nieve

@Matt What happens if you've got a few documents using the same States document and you need to update one or a few states? Will that still work: var stateLookup = session.Load("Config/states"); ... make changes.. session.Store(stateLookup)

Jason Meckley
08/31/2011 12:25 PM by
Jason Meckley

@Giedrius this is document DB, not a relational DB. there isn't a relationship between the states config and the mailing list. Instead you would save the state information directly in the the mailing list document.

This is the most difficult concept for developers coming from a relational data background. they ask "how does data persistance work without relations?" or "how do I make raven work like a RDBMS?". for more information on this check out Ayende's nosql tag.

Matt Warren
08/31/2011 12:47 PM by
Matt Warren

@Rafal

I didn't realize that you were talking about letting the customer edit database tables by hand. In that case, is it really much easier for them to write SQL insert/update statements than using the RavenDB management studio and editing a Json doc?

See http://blogs.hibernatingrhinos.com/5/ravendb-in-practice-part-1-an-introduction-to-ravendb for screen-shots.

BTW I'm not pretending that RavenDB and doc-databases are the answer to everything, just that they make some things easy.

Matt Warren
08/31/2011 12:51 PM by
Matt Warren

@Nieve

RavenDB has transactions and writing a doc is fully ACID, so the next time you load the doc you'll get the new values. The line I missed out from the code sample was: session.SaveChanges()

This writes the doc to the store as an atomic transaction, see http://ravendb.net/documentation/client-api/docs-documentsession-characteristics for more info.

Ayende Rahien
08/31/2011 01:29 PM by
Ayende Rahien

Rafal, What is the problem with modifying the document? That is what RavenDB is for.

Ayende Rahien
08/31/2011 01:30 PM by
Ayende Rahien

Bavo,

{ "States": [ { "Name": [ { "Culture": "en", "Value": "Texas"}, { "Culture": "fr", "Value": "French Texas" } ], "Abbrevation": "TX" } ] }

Ayende Rahien
08/31/2011 01:30 PM by
Ayende Rahien

Rafal, I still fail to see the problem. RavenDB has a management UI that you can use. Beside, if you are advocating FOR having random people execute SQL against the production db, you lost me

Robert Pickering
08/31/2011 02:12 PM by
Robert Pickering

What’s the cut off limit for this approach? I thinking what if you’re reference data was very city and town in the US rather than just a list of states? I guess this would still fit in memory without too much of a problem, but if you were going to provide some sort of auto-complete on this then you’d need to provide some sort of efficient way of search this, if each was a document in RavenDb then you could just use an index to ensure an efficient search time on the fields you’re interested in.

Ayende Rahien
08/31/2011 02:18 PM by
Ayende Rahien

Robert, You can still use an index to search inside a single document. Cut off point would be greater than a few megs, but I would probably do is something like: Config/States Config/Texas/Cities Config/Florida/Cities

Etc

Marco
08/31/2011 03:01 PM by
Marco

And what is the best approach to store this inside a NoSql "db"?

Address Street Country (reference) State (reference) Country Code Name-En/Name-Nl/etc. States (one or more states)

And when showing an address to the user it should show: Street Country.Name-Nl (current culture) State.Name-Nl (current culture)

Thanks

Rafal
08/31/2011 03:36 PM by
Rafal

Ayende, The problem i see is general lack of tools for modifying database contents, especially for modifying large Json documents, and general unfamiliarity of db admins with this new database type. But maybe the problem can be alleviated by providing scripts that automate basic tasks and show how to perform typical operations. This is not advocating to allow random people do random things to database, I'm talking about adoption of Raven by IT - usually they are the only people allowed to touch production databases.

Winston Junior
08/31/2011 05:50 PM by
Winston Junior

@Rafal, The problem with the DB Admins is that relational database was the silver bullet for more than 20 years and now we have new solutions that can be mixed with relational databases or not, they need to get back to the book and learn like all the people who write systems do every single day. I don't know if this is possible with RavenDB but with Mongo theres a command line where you can modify things using "scripts". A kind of javascript if I'm not wrong. In RavenDB I saw that there are a silverlight app, but I don't use this a lot yet, I really don't know all the resources available, you can modify documents there, but I don't know if you can do this using "scripts".

Winston Junior
08/31/2011 06:39 PM by
Winston Junior

@Marco As far as I'm concerned, the best approach for domain data is replicate this information, since this information almost never change (how many times you see an state or city that changed name?), there will be no problem with this approach.

Justin A
09/01/2011 01:00 AM by
Justin A

@Matt or @Ayende ...

can you please elaborate how you would use these states, in a simple document? eg. a User poco/document.

public class { public string Id { get; set; } public string Name { get; set; } public ????? State { get; set; } // <-- state this person was born in. }

Also, why the namespace Config/States ??? is 'Config/' something special?

Ayende Rahien
09/01/2011 05:14 AM by
Ayende Rahien

Marco, The Country / State usually just contains the id of the relevant item, not the actual value. You load that item and then access its name. We have very efficient ways of doing that.

Ayende Rahien
09/01/2011 05:16 AM by
Ayende Rahien

Rafal, Scripts are really easy to write against RavenDB. Hell, you can provide pure JS solution in a few minutes that requires nothing else. As for "general lack of tools", that isn't an issue for RavenDB, it comes already with such a tool to do just that.

Ayende Rahien
09/01/2011 05:17 AM by
Ayende Rahien

Justin, "Config/" is just my conventions for reference data, nothing special about it.

public class Address { public string Id { get; set; } public string Name { get; set; } public string State { get; set; } // <-- state this person was born in. }

Justin A
09/01/2011 05:28 AM by
Justin A

@Ayende .. so the value of that Address class is a string .. meaning .. there's no reference to the Config/States document? Just happens to use a value, from there?

Ayende Rahien
09/01/2011 05:33 AM by
Ayende Rahien

Justin, That value can uniquely identify the state in question in that document, but yes. There are no references in RavenDB

Justin A
09/01/2011 05:34 AM by
Justin A

Cheers! As always, much appreciated!

Rasmus
11/21/2011 12:29 PM by
Rasmus

Would you model this as a Config class with a single list? What about nexttime you need som "basic data" for at list? Would you then just extend this existing class/document?

Germán
01/17/2012 09:26 AM by
Germán

Hi!

I was loading yesterday existing reference/read-most data from sql server into a document. I ended up loading data from SQL Server with EF, shaping it with linq and persisting the shaped data to ravendb in one single document, following this post's ideas.

Is this the recommended way of loading existing data to RavenDB? Wouldn't it be nice to have this kind of one-time loading of data built-in into RavenDB Studio? A tool that would let you specify a sql data source, a linq query to shape/filter results, a destination document and done!

Thanks!

Germán

Ayende Rahien
01/17/2012 10:16 AM by
Ayende Rahien

German, That is how we do it, yes. Having such a tool isn't really useful from our perspective, you already have those tools outside of RavenDB, no need to replicate them inside RavenDB.

Germán
02/16/2012 03:09 AM by
Germán

Hi!

Is this approach also advisable if you have a lot of information? For example, I have at least 12000 locations with a lot of textual data, country data and so on, but in some parts of this application only id and name are needed as I want to display the name to the user after a webservice gives me the id.

With this in mind I could come up with two solutions: -make one document for each location with all the data and a map/reduce index just to get the subset of information needed in order to populate a select list or decode the id

or

-make one document for each location and another big document that has just the id and name. I would need to get them in sync.

In general, as you said, with sql there are just columns, rows, and relations. With RavenDb I think that we have more choices regarding how to shape our data and for me is just a little bit more diffcullt.

Any chance we get a good blog posts series on data modelling in document databases?

Germán

Ayende Rahien
02/16/2012 03:13 AM by
Ayende Rahien

Germán, I would go with the map/reduce approach, because you are always going to want to search this, never actually show the entire data set.

Comments have been closed on this topic.