Ayende @ Rahien

filter by tags archive

architecture (606) rss
bugs (450) rss
challanges (123) rss
community (377) rss
databases (481) rss
design (893) rss
development (640) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1442) rss
ravendb.net (526) rss
reviews (184) rss

2025
- May (7)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Apr 02 2025

RavenDB.NET Aspire integration

time to read 2 min | 373 words

Tweet Share Share 0 comments

Tags:

.NET Aspire is a framework for building cloud-ready distributed systems in .NET. It allows you to orchestrate your application along with all its dependencies, such as databases, observability tools, messaging, and more.

RavenDB now has full support for .NET Aspire. You can read the full details in this article, but here is a sneak peek.

Defining RavenDB deployment as part of your host definition:

using Projects;


var builder = DistributedApplication.CreateBuilder(args);


var serverResource = builder.AddRavenDB(name: "ravenServerResource");
var databaseResource = serverResource.AddDatabase(
    name: "ravenDatabaseResource", 
    databaseName: "myDatabase");


builder.AddProject<RavenDBAspireExample_ApiService>("RavenApiService")
    .WithReference(databaseResource)
    .WaitFor(databaseResource);


builder.Build().Run();

And then making use of that in the API projects:

var builder = WebApplication.CreateBuilder(args);


builder.AddServiceDefaults();
builder.AddRavenDBClient(connectionName: "ravenDatabaseResource", configureSettings: settings =>
{
    settings.CreateDatabase = true;
    settings.DatabaseName = "myDatabase";
});
var app = builder.Build();


// here we’ll add some API endpoints shortly…


app.Run();

You can read all the details here. The idea is to make it easier & simpler for you to deploy RavenDB-based systems.

Feb 25 2022

RavenDBDomain Modeling and Data Persistency

time to read 1 min | 23 words

Tweet Share Share 0 comments

Tags:

Dejan Miličić is talking with André Baltieri about data modeling and data persistency.

Feb 07 2022

RavenDBPractical Considerations for ACID/MVCC Storage Engines

time to read 1 min | 29 words

Tweet Share Share 1 comments

Tags:

My talk at the Carnegie Mellon Database Group about the internals of Voron and how we build a transactional storage engine.

Nov 21 2013

RavenDBThe Road to Release

time to read 5 min | 870 words

Tweet Share Share 5 comments

Tags:

raven

This post is here to answer several queries in the mailing list, and some questions that were raised in this blog post. I think that this is important enough to warrant a post here, instead of an email to the list, or just a comment.

To summarize, we had a few issues recently that impacted our users’ systems. Those are usually (but not always) cases where a combination of features wasn’t working properly (feature intersection), or just actual bugs. That led to some questions that are worth answering. You can find all the details below, but I would like to talk about what we are actually doing.

In the past 4 or 5 years, we have managed to create a NoSQL database for the .NET platform, and it has been doing nothing but picking up steam ever since we released it. We have been working hard to provide performance, features and stability for our users. On a personal note, it has been quite an amazing ride, seeing more people put RavenDB to use and creating interesting applications and features.

First, there seems to be some concerns about the new things that we are doing. Voron, in particular, appears to be a cause for concern. We have relied on Esent as our storage engine for the past four or five years, to great success. Not least of its properties is the fact that Esent has been around the block for a while now, and is proven to be robust and safe in the simplest of methods, high and constant use over multiple decades. Esent also have its share of problems, but we didn’t forget why we chose it in the first place. Indeed, I still think that that was an excellent choice. With Voron, the only change you’ll see is that it won’t be the only choice.

Voron is meant to allow us to run on Linux machines, and to provide us with a fully owned stack, so we can do more interesting things across the board. But we aren’t letting go of Esent, and in any way you care to name, Esent is still going to be the core (and default) option we have for storage in RavenDB. With RavenDB 3.0, you’ll have the option to make an informed choice about selecting Voron as a storage engine, with a list of pros & cons.

Second, we do acknowledge that we suffer from a typical blindness for how we approach RavenDB. Since we built it, we know how things are supposed to be, and that is how we usually test them. Even when we try to go for the edge cases, we are constrained by our own thinking. We are currently working on getting an external testing team to do just that. Actively work to make use of RavenDB in creative ways specifically to try to break it.

Third, our own internal policies for releasing RavenDB need to be adjusted slightly. In particular, we are usually faced with two competing pressures: Release Already and Super Stable. We have always tried to release both unstable and stable versions, and the process for moving from unstable to stable is a pretty good one, I think. We have:

The test suite, now clocking at just over 3,000 tests.
A separate test suite that is meant to stress test the database.
Performance test suite, to make sure that we are in line for general performance.
Longevity tests, making sure that we don’t have any issues in long term usage.
Finally, as an act of dog fooding, we upgrade our own servers to the new build, and let it run in production for a while, just to make absolutely sure.

We are going to add additional tests (see the 2nd point) to the process, and we are going to extend the duration of all of those steps. I think that in the past few months we have have leaned too far toward the “Release Already” mode, so we are going to try to lean back (hopefully not too much) the other way.

Fourth, with regards to licensing. It has been our policy to provide anyone with a free trail license of RavenDB if they want to test it on a temporary basis. We require permanent non developer servers to have a license. I think that this strikes the appropriate balance.

Fifth, we are going to be working on additional tooling round deployment and upgrades. For customers that jump multiple versions (moving from 1.x to 2.5, for example), the update process of the RavenDB internal storage data during upgrades can be lengthy and there is too little visibility into it at the moment. We are also working on building tools that help figure out what is going on with a production instance (more ops endpoint, more visibility into internal operations, etc).

In summary, we are grateful for our users for bringing any issues to our attention. We are trying hard to have a very responsive feedback cycle, and we can usually resolve most issues within 24 – 48 hours. But I know we need to do better in making sure that users have a more streamlined experience.

Mar 26 2012

RavenDBSelf optimizing Ids

time to read 3 min | 582 words

Tweet Share Share 15 comments

Tags:

Raven

One of the things that is really important for us in RavenDB is the notion of Safe by Default and Zero Admin. What this means is that we want to make sure that you don’t really have to think about what you are doing for the common cases, RavenDB will understand what you mean and figure out what is the best way to do things.

One of the cases where RavenDB does that is when we need to generate new ids. There are several ways to generate new ids in RavenDB, but the most common one, and the default, is to use the hilo algorithm. It basically (ignoring concurrency handling) works like this:

var currentMax = GetMaxIdValueFor("Disks");
var limit = currentMax + 32;
SetMaxIdValueFor("Disks");

And now we can generate ids in the range of currentMax to currentMax+32, and we know that no one else can generate those ids. Perfect!

The good thing about it is that now we have a reserved range, we can create ids without going to the server. The bad thing about it is that we now reserved a range of 32. If we create just one or two documents and then restart, we would need to request a new range, and the rest of that range would be lost. That is why the default range value is 32. It is small enough that gaps aren’t that important*, but it since in most applications, you usually create entities on an infrequent basis and when you do, you usually generate just one, then it is big enough to still provide a meaningful optimization with regards to the number of times you have to go to the server.

* What does it means, “gaps aren’t important”? The gaps are never important to RavenDB, but people tend to be bothered when they see disks/1 and disks/2132 with nothing in the middle. Gaps are only important for humans.

So this is perfect for most scenarios. Except one very common scenario, bulk import.

When you need to load a lot of data into RavenDB, you will very quickly note that most of the time is actually spent just getting new ranges. More time than actually saving the new documents takes, in fact.

Now, this value is configurable, so you can set it to a higher value if you care for it, but still, that was annoying.

Hence, what we have now. Take a look at the log below:

It details the requests pattern in a typical bulk import scenario. We request an id range for disks, and then we request it again, and again, and again.

But, notice what happens as times goes by (and not that much time) before RavenDB recognizes that you need bigger ranges, and it gives you them. In fact, very quickly we can see that we only request a single range per batch, because RavenDB have optimized itself based on our own usage pattern.

Kinda neat, even if I say so myself.

Feb 27 2012

RavenDBIt was the weekend before the wedding…

time to read 2 min | 302 words

Tweet Share Share 6 comments

Tags:

raven

…when all through the house. Not a creature was stirring, not even a mouse. Because everyone was away on a Stag/Hen weekend.

Phil Jones has just notified me that a site he has been working on for a while went live. Escape Trips is a site that provides Stag and Hen trips in the UK. A Stag/Hen weekend is apparently a bigger version of a stag party, I assume that this is the origin for movies like this (well, not really).

At any rate, this site is actually powered by RavenDB throughout. We actually have several videos in the pipeline of me and Phil hashing out some details about the site when it was built.

The site is fast, and Phil was kind enough to give me some interesting stats.

Some interesting performance stats compared to SQL Express which was running on the VPS. SQL Express was eating ~500MB of RAM (limited), RavenDB has been sat at ~80MB since launch last night! I think EF was eating CPU as well, CPU usage is way down as well.

Performance comment wise, I don’t know if EF was to blame but IIS process CPU usage is down even though traffic has doubled since the launch (mainly crawlers and a new adwords campaign). After running since the launch on Thursday night, the RAM usage has increased to 100MB, still a really great number though as I plan to scale down the VPS’s RAM saving money, RavenDB will actually be paying for itself!

The website looks quite simple from the public side but most of the development has gone into the private administrative website for dealing with sales, customer support and content editing. Performance wise, the system is more responsive and users are very pleased!

Pretty cool!

Feb 17 2012

RavenDBIndex Boosting

time to read 6 min | 1084 words

Tweet Share Share 14 comments

Tags:

raven

Recently we added a really nice feature, boosting the results while indexing.

Boosting is a way to give documents or attributes in a document weights. Attribute level boosting is a way to tell RavenDB that a certain attribute in a document is more important than the others, so it will show up higher in queries when other properties are involved in a query. A document level boosting means that a certain document is more important than another (when using multi maps).

Let us see a few examples where this is happening. The simplest scenario is when we have a multi field search, and we want one of the fields to be the more important one. For example, we decided that when you make a search for first name and last name, a match on the first name has higher relevance than a match on the last name. We can define this requirement with the following index:

public class Users_ByName : AbstractIndexCreationTask<User>
{
    public Users_ByName()
    {
        Map = users => from user in users
                       select new
                       {
                           FirstName = user.FirstName.Boost(3),
                           user.LastName
                       };
    }
}

And we can query the index using:

var matches = session.Query<User,UsersByName>()
      .Where(x=>x.FirstName == "Ayende" || x.LastName == "Eini")
      .ToList()

Assuming that we have a user with the first name “Ayende” and another user with the last name “Eini”, this will find both of them, but will rank the user with the name “Ayende” first.

Let us see another variant, we have a multi map index for users and accounts, both are searchable by name, but we want to ensure that accounts are more important than users. We can do that using the following index:

public class UsersAndAccounts : AbstractMultiMapIndexCreationTask
{
    public UsersAndAccounts()
    {
        AddMap<User>(users =>
                     from user in users
                     select new {Name = user.FirstName}
            );
        AddMap<Account>(accounts =>
                        from account in accounts
                        select new {account.Name}.Boost(3)
            );
    }
}

If we have query that has matches for users and accounts, this will make sure that the account comes first.

And finally, a really interesting use case is that based on the entity itself, you decide to rank it higher. For example, we want to rank customers that ordered a lot from us higher than other customers. We can do that using the following index:

public class Accounts_Search : AbstractIndexCreationTask<Account>
{
    public Accounts_Search()
    {
        Map = accounts =>
              from account in accounts
              select new
              {
                  account.Name
              }.Boost(account.TotalIncome > 10000 ? 3 : 1);
    }
}

This way, we get the more important customers first. And this is really one of those things that brings up the polish in the system, the things that makes the users sit up and take notice.

Sep 12 2011

RavenDBMulti Maps / Reduce indexes

time to read 18 min | 3538 words

Tweet Share Share 14 comments

Tags:

Raven

If you thought that map/reduce was complex, wait until we introduce the newest feature in RavenDB:

Multi Maps / Reduce Indexes

Okay, to be frank, they aren’t complex at all, they are actually quite simple, when you sit down to think about them. Again, I have to credit to Frank Schwieterman, who came up with the idea.

Wait! Let us back track a bit and try to explain what the actual problem is that we are trying to solve. The problem with Map/Reduce is that you can only gather information from a single set of documents. Let us look at the following documents as an example:

{// users/ayende 
   "Name": "Ayende Rahien" 
} 

{ // posts/1234 
  "Title": "Why RavenDB?", 
  "Author": "users/ayende" 
} 
{ // posts/1235 
  "Title": "It is awesome!", 
  "Author": "users/ayende" 
} 

We want to get an list of users with the count of posts that they made. That is trivially easy, as shown in the following map/reduce index:

from post in docs.Posts
select new { post.Author, Count = 1 }

from result in results
group result by result.Author into g
select new
{
   Author = g.Key,
   Count = g.Sum(x=>x.Count)
}

The output of this index would be something like this:

{ Author: "users/ayende", Count: 2 }

And we can load it efficiently using Includes:

session.Query<AuthorPostStats>("Posts/ByUser/Count")
     .Include(x=>x.Author)
     .ToList();

This will load all the users statistics, and also load all of the associated users, in a single query to the database. So far, fairly simple and routine.

The problem begins when we want to be able to query this index using the user’s name. As you can deduce from the documents shown previously, the user name isn’t available on the post document, which means that we can’t index it. That, in turn, means that we can’t search it.

We are left with several bad options:

De-normalize the User’s Name property to the Post document, solely for indexing purposes.
Don’t implement this feature.
Write the following scary query:

from doc in docs.WhereEntityIs("Users","Posts") 
let user = doc.IfEntityIs("Users") 
let post = doc.IfEntityIs("Posts") 
select new 
{ 
  Count = user == null ? 1 : 0, 
  Author = user.Name, 
  UserId = user.Id ?? post.Author 
} 

from result in results 
group result by result.UserId into g 
select new 
{ 
   Count = g.Sum(x=>x.Count), 
   Author = g.FirstNotNull(x=>x.Author), 
   UserId = g.Key 
}

This is actually pretty straightforward, when you sit down and think about it. But there is a whole lot of ceremony involved, and it is actually more than a bit hard to figure out what is going on in more complex scenarios.

This is where Frank’s suggestion came in:

…if I were try to support linq-based indexes that can map multiple types, it might look like:
public class OverallOpinion : AbstractIndexCreationTask<?>
{
   public OverallOpinion()
   {
       Map<Foo>(docs => from doc in docs select new { Id = doc.Id, LastUpdated = doc.LastUpdated }
       Map<OpinionOfFoo>(docs => from doc in docs select new { Id = Doc.DocId, Rating = doc.Rating, Count = 1}
       Reduce = docs => from doc in docs
                        group doc by doc.Id into g
                        select new {
                           Id = g.Key,
                           LastUpdated = g.Values.Where(f => f.LastUpdated != null).FirstOrDefault(),
                           Rating = g.Values.Rating.Sum(),
                           Count = g.Values.Count.Sum()
                        }
   }
}
It seems like some clever code could combine the different map expressions into one.

This is part of a longer discussion, but basically, it got me thinking about how we can implement multi maps, and I came up with the following:

// Map from posts
from post in docs.Posts
select new { UserId = post.Author, Author = (string)null, Count = 1 }

// Map from users
from user in docs.Users
select new { UserId = user.Id, Author = user.Name, Count = 0 }

// Reduce takes results from both maps
from result in results
group result by result.UserId into g
select new
{
   Count = g.Sum(x=>x.Count),
   Author = g.Where(x=>x!=null).First(),
   UserId = g.Key
}

The only thing to understand now is that we have multiple map functions, getting data from multiple sources. We can then take those sources and reduce them together. The only requirements that we have is that the output of all of the map functions would be identical (and obviously, match the output of the reduce function). Then we can just treat this information as normal map/reduce index, which means that all of the usual RavenDB features kick in. Let us see what this actually means, using code. We have the following classes:

public class User
{
    public string Id { get; set; }
    public string Name { get; set; }
}

public class Post
{
    public string Id { get; set; }
    public string Title { get; set; }
    public string AuthorId { get; set; }
}

public class UserPostingStats
{
    public string UserName { get; set; }
    public string UserId { get; set; }
    public int PostCount { get; set; }
}

And we have the following index:

public class PostCountsByUser_WithName : AbstractMultiMapIndexCreationTask<UserPostingStats>
{
    public PostCountsByUser_WithName()
    {
        AddMap<User>(users => from user in users
                              select new
                              {
                                  UserId = user.Id,
                                  UserName = user.Name,
                                  PostCount = 0
                              });

        AddMap<Post>(posts => from post in posts
                              select new
                              {
                                  UserId = post.AuthorId,
                                  UserName = (string)null,
                                  PostCount = 1
                              });

        Reduce = results => from result in results
                            group result by result.UserId
                            into g
                            select new
                            {
                                UserId = g.Key,
                                UserName = g.Select(x => x.UserName).Where(x => x != null).First(),
                                PostCount = g.Sum(x => x.PostCount)
                            };

        Index(x=>x.UserName, FieldIndexing.Analyzed);
    }
}

As you can see, we are getting the values from two different collections. We need to make sure that they are actually using the same output, which is what caused us the null casting for posts and the filtering that we need to do on the reduce.

But that is it! It is ridiculously easy compared to the previous alternative. Moreover, it follows quite naturally from both the exposed API and the internal implementation inside RavenDB. It took me roughly half a day to make it work, and some of that was dedicated to lunch Smile . In truth, most of that time is actually just handling the error conditions nicely, but… anyway, you get the point.

Even more interesting than the rest is the fact that for all intents and purposes, what we have done here is a join between two different collections. We were never able to really resolve the problems associated with joins before, update notifications were always too complex to figure out, but going the route of multi map makes things so easy.

Just for fun, you might have noticed that we marked the UserName property as analyzed, which means that we can issue full text queries against it. Let us assume that we want to provide users with the following UI:

It is now just a matter of writing the following code:

using (var session = store.OpenSession())
{
    var ups= session.Query<UserPostingStats, PostCountsByUser_WithName>()
        .Where(x => x.UserName.StartsWith("rah"))
        .ToList();

    Assert.Equal(1, ups.Count);

    Assert.Equal(5, ups[0].PostCount);
    Assert.Equal("Ayende Rahien", ups[0].UserName);
}

So you can do a cheap full text search over joins quite easily. For that matter, joins are cheap now, because they are computed on the background and queried directly from the pre-computed index.

Okay, enough blogging for now, going to implement all the proper error handling and then push an awesome new build.

Oh, and a final thought, Multi Map was shown in this blog only in the context of Multi Maps/Reduce, but we also support just the ability to use multi map on its own. This is quite useful if you want to enable search over a large number of entities that reside in different collections. I’ll just drop a bit of code here to show how it works:

public class CatsAndDogs : AbstractMultiMapIndexCreationTask
{
    public CatsAndDogs()
    {
        AddMap<Cat>(cats => from cat in cats
                         select new {cat.Name});

        AddMap<Dog>(dogs => from dog in dogs
                         select new { dog.Name });
    }
}

[Fact]
public void CanQueryUsingMutliMap()
{
    using (var store = NewDocumentStore())
    {
        new CatsAndDogs().Execute(store);

        using(var documentSession = store.OpenSession())
        {
            documentSession.Store(new Cat{Name = "Tom"});
            documentSession.Store(new Dog{Name = "Oscar"});
            documentSession.SaveChanges();
        }

        using(var session = store.OpenSession())
        {
            var haveNames = session.Query<IHaveName, CatsAndDogs>()
                .Customize(x => x.WaitForNonStaleResults(TimeSpan.FromMinutes(5)))
                .OrderBy(x => x.Name)
                .ToList();

            Assert.Equal(2, haveNames.Count);
            Assert.IsType<Dog>(haveNames[0]);
            Assert.IsType<Cat>(haveNames[1]);
        }
    }
}

All together, a great day’s work.

Apr 24 2011

RavenDBLet us write our own JSON Parser, NOT

time to read 8 min | 1543 words

Tweet Share Share 11 comments

Tags:

Raven

One accusation that has been leveled at me often is that I keep writing my own implementation of Xyz (where Xyz is just about anything). The main problem is that I can get overboard with that, but for the most part, I think that I managed to strike the right balance. Wherever possible, I re-use existing, but when I run into problems that are easier to solve by creating my own solution, I would go with that.

A case in point is the JSON parser inside RavenDB. From the get go, I used Newtonsoft.Json.dll. There wasn’t much to think of, this is the default implementation from my point of view. And indeed, it has been an extremely fine choice. It is a rich library, it is available for .NET 3.5, 4.0 & Silverlight and it meant that I had opened up a lot of extensibility for RavenDB users.

Overall, I am very happy. Except… there was just one problem, with large JSON documents, the library showed some performance issues. In particular, a 3 MB JSON file took almost half a second to parse. That was… annoying. Admittedly, most documents tends to be smaller than that, but it also reflected on overall performance when batching, querying, etc. When you are querying, you are also building large json documents (a single document that contains a list of results, for example), so that problem was quite pervasive for us.

I set out to profile things, and discovered that the actual cost wasn’t in the JSON parsing itself, that part was quite efficient. The costly part was actually in building the JSON DOM (JObject, JArray, etc). When people usually think about JSON serialization performance, they generally think about the perf from and to .NET objects. The overriding cost in that sort of serialization is actually how fast you can call the setters on the objects. Indeed, when looking at perf metrics on the subject, most of the comparisons were concentrated on that aspect almost exclusively.

That make sense, since for the most part, that is how people use it. But for RavenDB, we are using JSON DOM for pretty much everything. This is how we are representing a document, after all, and that idea is pretty central to a document database.

Before setting out to write our own, I looked at other options.

ServiceStack.Json - that one was problematic for three reasons:

It wasn’t really nearly as rich in terms of API and functionality.
It was focused purely on reading to and from .NET objects, with no JSON DOM supported.
The only input format it had was a string.

The last one deserves a bit of explanation. We cannot afford to use a JSON implementation that accepts a string as input, because that JSON object we are reading may be arbitrarily large. Using a string means that we have to allocate all of that information up front. Using a stream, however, means that we can allocate far less information and reduce our overall memory consumption.

System.Json – that one had one major problem:

Only available on Silverlight, FAIL!

I literally didn’t even bother to try anything else with it. Other stuff we have looked on had those issues or similar as well, mostly, the problem was no JSON DOM available.

That sucked. I was seriously not looking to writing my own JSON Parser, especially since I was going to add all the bells & whistles of the Newtonsoft.Json. :-(

Wait, I can hear you say, the project is open source, why not just fix the performance problem? Well, we have looked into that as well.

The actual problem is pretty much at the core of how the JSON DOM is implemented in the library. All of the JSON DOM are basically linked lists, and all operations on the DOM are O(N). With large documents, that really starts to hurt. We looked into what it would take to modify that, but it turned out that it would have to be a breaking change (which pretty much killed the notion that it would be accepted by the project) or a very expensive change. That is especially true since the JSON DOM is quite rich in functionality (from dynamic support to INotifyPropertyChanged to serialization to… well, you get the point).

Then I thought about something else, can we create our own JSON DOM, but rely on Newtonsoft.Json to fill it up for us? As it turned out, we could! So we basically took the existing JSON DOM, stripped it out of everything that we weren’t using. Then we changed the linked list support to a List and Dictionary, wrote a few adapters (RavenJTokenReader, etc) and we were off to the races. We were able to utilize quite a large fraction of the things that Newtonsoft.Json already did, we resolved the performance problem and didn’t have to implement nearly as much as I feared we would.

Phew!

Now, let us look at the actual performance results. This is using a 3 MB JSON file:

Newtonsoft Json.NET - Reading took 413 ms
Using Raven.Json - Reading took 140 ms

That is quite an improvement, even if I say so myself :-)

The next stage was actually quite interesting, because it was unique to how we are using JSON DOM in RavenDB. In order to save the parsing cost (which, even when optimized, is still significant), we are caching in memory the parsed DOM. The problem with caching of mutable information is that you have to return a clone of the information, and not the actual information (because then it would be mutated by the called, corrupting the cached copy).

Newtonsoft.Json supports object cloning, which is excellent. Except for one problem. Cloning is also an O(N) operation. With Raven.Json, the cost is somewhat lower. But the main problem is that we still need to copy the entire large object.

In order to resolve this exact issue, we introduced a feature called snapshots to the mix. Any object can be turned into a snapshot. A snapshot is basically a read only version of the object, which we then wrap around another object which provide local mutability while preserving the immutable state of the parent object.

It is much easier to explain in code, actually:

public void Add(string key, RavenJToken value)
{
    if (isSnapshot)
        throw new InvalidOperationException("Cannot modify a snapshot, this is probably a bug");

    if (ContainsKey(key))
        throw new ArgumentException("An item with the same key has already been added: " + key);

    LocalChanges[key] = value; // we can't use Add, because LocalChanges may contain a DeletedMarker
}

public bool TryGetValue(string key, out RavenJToken value)
{
    value = null;
    RavenJToken unsafeVal;
    if (LocalChanges != null && LocalChanges.TryGetValue(key, out unsafeVal))
    {
        if (unsafeVal == DeletedMarker)
            return false;

        value = unsafeVal;
        return true;
    }

    if (parentSnapshot == null || !parentSnapshot.TryGetValue(key, out unsafeVal) || unsafeVal == DeletedMarker)
        return false;

    value = unsafeVal;

    return true;
}

If the value is on the local changes, we use that, otherwise if the value is in the parent snapshot, we use that. We have the notion of local deletes, but that is about it. All changes happen to the LocalChanges.

What this means, in turn, is that for caching scenarios, we can very easily and effectively create a cheap copy of the item without having to copy all of the items. Where as cloning the 3MB json object in Newtonsoft.Json can take over 100 ms to clone, we can create a snapshot (it involves a clone, so the first time it is actually expensive, around the same cost as Newtonsoft.Json is) and from the moment we have a snapshot, we can generate children for the snapshot at virtually no cost.

Overall, I am quite satisfied with it.

Oh, and in our tests runs, for large documents, we got 100% performance improvement from this single change.

Apr 17 2011

RavenDBSafe by default design – it works!

time to read 1 min | 80 words

Tweet Share Share 4 comments

Tags:

Raven

Originally posted at 4/4/2011

One of the things that I really tried to do with RavenDB is to make sure that it is safe by default, which means that it will automatically detect common errors and warn you about it.

Today I run into the following Stack Overflow question.

What can I say, it works!

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB.NET Aspire integration

RavenDBDomain Modeling and Data Persistency

RavenDBPractical Considerations for ACID/MVCC Storage Engines

RavenDBThe Road to Release

RavenDBSelf optimizing Ids

RavenDBIt was the weekend before the wedding…

RavenDBIndex Boosting

RavenDBMulti Maps / Reduce indexes

RavenDBLet us write our own JSON Parser, NOT

RavenDBSafe by default design – it works!

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed