Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,916 | Comments: 49,398

filter by tags archive
time to read 3 min | 409 words

There are two methods in StoreManagerController that we haven’t touched yet. Dealing with them is going to just a little bit different:

image

Can you figure out why?

The answer is that when we started, we decided that there really isn’t any reason to store artists as individual documents. They are just reference data, after all. Now, however, we need to reference them.

We could, of course, create a set of artists documents, at which point it would be very easy to port the code:

image

But I still think that artists don’t really exists in this model as an independent entity. So instead of going with this route, we are going to project them.

We define the “Arists” index using the following map/reduce linq queries:

// map 
from album in docs.Albums
select new { album.Artist.Id, album.Artist.Name }

// reduce 
from artist in results
group artist by new { artist.Id, artist.Name } into g
select new { g.Key.Id, g.Key.Name }

If you’ll look carefully, you’ll notice that this is essentially doing a distinct over all the artists across all albums.

And that means that we can now write the code for those two methods like this;

image

There is one very important things to remember here: In Raven’s queries are cheap, because Raven allows you to query indexes only, and those indexes are built in the background, making queries tend to be very fast.

That changes the way that you think about designing you system and data model. You want to move a lot of your processing to indexes and queries upon those indexes, because it tends to be cheaper all around.

time to read 2 min | 348 words

The final part of the port of the MVC Music Store to Raven is the administration section, implemented in StoreManagerController. I am going to show comparisons of all the methods where the port doesn’t offer anything new, and then focus on an interesting conceptual difference between the implementations.

image image

Please note that the main reason that the Raven code is so much shorter is that I threw away the nonsensical error handling (or lack thereof).

  image   image

Again, throwing away the error handling that isn’t made a lot of the difference in the code.

image image

Now we get to an interesting difference. The old code will delete orders if they include the deleted album. Raven’s code does no such thing.

It is important to understand that there is no such thing as referential integrity in Raven (or document databases in general). This can be a plus or a minus, but in this case, we are turning that into a plus, because we can delete an album without losing orders.  I don’t know about you, but I like the idea of keeping the orders around. :-)

A bit more formally, documents in Raven are independent, they aren’t affected by changes to other documents.

There are two more methods to discuss with regards to the StoreManagerController, but I’ll discuss them in my next post.

time to read 2 min | 272 words

The checkout process in the MVC Music Store is composed of two parts, adding address & payment options and completing the order.

The old code for address & payment is on the left, the new on the right.

image image

As you can see, they are quite similar. Raven’s code isn’t complete yet, though.

If you’ll recall, we stated that we are going to store the CountSold property inside the Album document, to allow us to easily sort by that count. We now need to write that logic, I put it directly after the call to CreateOrder:

image

It is important to note that we are loading all the albums document in a single query. And when we save, Raven is going to make a single (batched) call to the server.

And now, merely to completion sake (pun intended) let us look at the Complete method:

image image

I think by now you can tell what is going on in each system. The next post will cover the administration section.

time to read 3 min | 558 words

We will start with the Index() method:

image

There are one this in this code that bothers me, and that is that this code is going to perform two DB queries. But that is beside the point, since we are going to modify the whole thing.

And here is my port:

image

As you can see, it is pretty much the same, and not really that interesting. Let us see what else we have:

image

Something that is important to note here is that we are doing a search on the name of a genre. The problem is that the genre name isn’t the primary key, worse, there isn’t even an index on the name column. Now, admittedly, the genre table contains ten rows, but it is the principal of the thing. (If you are smart, you only have to be read the riot act by the DBA about non index queries in production once).

Now, it would be trivial for us to implement this in Raven using the same approach, but I don’t see a reason to do this. The genre that we get in the Browse method is dependant on the data that we return from the Index method, so there is no reason no to pass the id of the genre directly. I modified the Index() action to pass the entire genre, not just the genre name, and to pass the id back to the Browse() action, not the name.

I gotten started implementing this, but I got stuck on the association of Albums from the genre.

image

Document database doesn’t normally have associations, and they don’t have joins. So how can we do this?

By now, you should be pretty familiar with the answer, we need to define an index :-)

// AlbumsByGenre
from album in docs.Albums
where album.Genre != null
select new { Genre = album.Genre.Id }

And this index allows us to write this code:

image

And finally, we have the GenreMenu:

image

Which we can port very easily:

image

And that is all for the StoreController

time to read 3 min | 548 words

I noticed that I had  typo when I inserted the albums data, the artist data was stored as “Arist”. This give me a chance to show you how we can do a migration that is a bit more advanced.

using (var documentStore = new DocumentStore { Url = "http://localhost:8080" })
{
    documentStore.Initialise();

    var count = 0;

    do
    {
        var queryResult = documentStore.DatabaseCommands.Query("Raven/DocumentsByEntityName", new IndexQuery
        {
            Query = "Tag:`Albums`",
            PageSize = 128,
            Start = count
        });


        if (queryResult.Results.Length == 0)
            break;

        count += queryResult.Results.Length;
        var cmds = new List<ICommandData>();
        foreach (var result in queryResult.Results)
        {
            var arist = result.Value<JObject>("Arist");
            if(arist == null)
                continue;
                        
            result["Artist"] = arist;
            result.Remove("Arist");

            cmds.Add(new PutCommandData
            {
                Document = result,
                Metadata = result.Value<JObject>("@metadata"),
                Key = result.Value<JObject>("@metadata").Value<string>("@id"),
            });
        }

        documentStore.DatabaseCommands.Batch(cmds.ToArray());

    } while (true);
    
}

The code itself should be hard to follow I think, it shows how we can manipulate documents by working with the JSON document directly, instead of having to go through an object layer.

time to read 5 min | 807 words

On my last post, I mention that we need to add a CountSold property to all the albums, in most SQL system, something like that can be pretty painful. The syntax for adding a new column is easy, but actually getting it done, and deployed, and versioned, is pretty hard. With Raven, if you add a new property, it will automatically be added to your document when you next save it. There is no action required on your part. The same, by the way, would happen when you remove a property. Raven will clean it up after you.

The question is what happens when we want to set that value to something, not just to the default value? We need to provide that logic somehow, and here is a simple way of doing so;

using (var documentStore = new DocumentStore { Url = "http://localhost:8080" })
{
    documentStore.Initialise();
    using (var session = documentStore.OpenSession())
    {
        IDictionary<string,int> albumToSoldCount = new Dictionary<string, int>();
        int count = 0;

        do
        {
            var results = session.Query<SoldAlbum>("SoldAlbums")
                .Take(128)
                .Skip(count)
                .ToArray();

            if (results.Length == 0)
                break;
            count += results.Length;
            foreach (var soldAlbum in results)
            {
                albumToSoldCount[soldAlbum.Album] = soldAlbum.Quantity;
            }
        } while (true);

        count = 0;
        do
        {
            var albums = session.Query<Album>()
                .Skip(count)
                .Take(128)
                .ToArray();
            if (albums.Length == 0)
                break;

            foreach (var album in albums)
            {
                int value;
                albumToSoldCount.TryGetValue(album.Id, out value);

                album.CountSold = value;
            }

            count += albums.Length;

            session.SaveChanges();
            session.Clear();
        } while (true);
    }
}

To those of you who haven’t bother to read the code, this is reading the index that we previously created and remembering its value. Then we start reading batches of albums and update their counts. All in all, it is quite simple.

An additional nice property of this script is that you can run it is safe to run it multiple times.

time to read 2 min | 350 words

As I mentioned, we can solve the GetTopSellingAlbums() problem using map/reduce, but that isn’t really a good way of doing it. The problem with doing that (aside from the scared looks and pained sounds that you get when you mention it) is that it is trying to solve the problem in a relational way. Indeed, the previous solution was an near duplication of how a relational database would process that query. So, what is the doc db approach for solving this issue?

The answer is quit simple, remember that documents are independent, and think about the question. What we are asking is what are the top selling albums. If we add a CountSold property to the album, we would suddenly find it so much easier to handle this problem. This means that we would need to update all the albums that are part of a given order when an order is submitted, but that is acceptable (this exact same operation is commonly done in SQL databases as well).

For now, let us waive how we create the CountSold property and fill it with the right values (I’ll discuss it in my next post), for now, assume that this happened, how can we GetTopSellingAlbums() problem?

Well, that is easy enough. All we need to do is define an index for CountSold.

// AlbumsByCountSold
from album in docs.Albums
select new { album.CountSold };

With that, we can implement GetTopSellingAlbums like this:

image

And now it is done, very simple, very efficient and quit elegant, eve if I say so myself.

time to read 4 min | 621 words

The current HomeController looks like this:

image

I really don’t like the fact that the controller issues queries like that, but we will let it go for now.

This query (thanks to EF Prof) looks like this:

image

And here we run into a very interesting problem, we can’t really replicate this query. The reason is that this query runs over multiple tables which our model says would be in different documents.

There are several ways in which we can fix this. One way of doing this would be to define a map / reduce index on top of orders.

Note: Yes, I am familiar with this comic.

The way that I am about to show you isn’t the way I would recommend going for real, but I want to show it anyway. I’ll discuss the idiomatic Raven way of handling this feature in my next post.

Map/reduce in Raven is just a couple of Linq queries, so it is nothing to be worried about. As a reminder, we have the following order documents in our database:

image_thumb9 image image

We define the index “SoldAlbums” using the following queries.

// map
from order in docs.Orders
from line in order.Lines
select new{ line.Album, line.Quantity }

// reduce
from result in results
group result by result.Album into g
select new{ Album = g.Key, Quantity = g.Sum(x=>x.Quantity) }

As you can see, those are two very simple Linq queries.

The result of which would be:

image

Once we have that, it is trivial to derive the answer to GetTopSellingAlbums. Indeed, the following function implements the exact same logic and has the same output as the previous implementation:

image

The way it work is pretty simple, we get the most sold albums (by sorting on descending quantity), then load them from the database. Because we might have less than count top selling albums, we need to top it off from regular albums.

This mean that this code execute 2 – 3 queries. I don’t really like it, but on my machine, it takes about less than 10 ms to do all three requests, which is livable.

The reason that I am posting this solution is that I want to show this as an approach to a problem, not as the recommended approach for how to solve it, I’ll do that in my next post.

time to read 3 min | 495 words

Here is the code required to take the data in the MVC Music Store database and turn into the appropriate Raven documents:

using (var documentStore = new DocumentStore { Url = "http://localhost:8080" })
{
    documentStore.Initialise();
    using (var session = documentStore.OpenSession())
    {
        foreach (var album in storeDB.Albums.Include("Artist").Include("Genre"))
        {
            session.Store(new
            {
                Id = "albums/" + album.AlbumId,
                album.AlbumArtUrl,
                Arist = new { album.Artist.Name, Id = "artists/" + album.Artist.ArtistId },
                Genre = new { album.Genre.Name, Id = "genres/" + album.Genre.GenreId },
                album.Price,
                album.Title,
            });
        }
        foreach (var genre in storeDB.Genres)
        {
            session.Store(new
            {
                genre.Description,
                genre.Name,
                Id = "genres/" + genre.GenreId
            });
        }
        session.SaveChanges();
    }
}

As you can see, it is pretty simple and, even if I say so myself, pretty slick.

I use anonymous types here because I am only concerned with porting the data, I don’t really care about how to deal types right now.

time to read 3 min | 449 words

Just a few words about the way that I setup Raven to be used in the MVC Music Store application before we get to the actual code.

  • The model is (intentionally) very close to the one used by NHibernate. We initialize the document store in the application start.
  • We then open/close the session on request boundary, and create a way to access the current session.
  • If the application supported a container, I would make sure that the controllers got the session instance through that, but it doesn’t, so I just used static gateway.
    • If you don’t like, feel free to submit a patch.
public class MvcApplication : System.Web.HttpApplication
{
    private const string RavenSessionKey = "Raven.Session";
    private static DocumentStore _documentStore;

    protected void Application_Start()
    {
        _documentStore = new DocumentStore { Url = "http://localhost:8080/" };
_documentStore.Initialise(); AreaRegistration.RegisterAllAreas(); RegisterRoutes(RouteTable.Routes); } public MvcApplication() { BeginRequest += (sender, args) => HttpContext.Current.Items[RavenSessionKey] = _documentStore.OpenSession(); EndRequest += (o, eventArgs) => { var disposable = HttpContext.Current.Items[RavenSessionKey] as IDisposable; if (disposable != null) disposable.Dispose(); }; } public static IDocumentSession CurrentSession { get { return (IDocumentSession) HttpContext.Current.Items[RavenSessionKey]; } } }

This is pretty much it, as far as Raven’s initialization is concerned.

FUTURE POSTS

  1. Researching a disk based hash table - 16 hours from now

There are posts all the way to Nov 12, 2019

RECENT SERIES

  1. re (24):
    12 Nov 2019 - Document-Level Optimistic Concurrency in MongoDB
  2. Voron’s Roaring Set (2):
    11 Nov 2019 - Part II–Implementation
  3. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  4. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  5. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats