Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,640
|
Comments: 51,260
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 100 words

One of the things that some people fear in a distributed source control is that they might run into conflicts all the time.

My experience has shown that this isn’t the case, but even when it is, there isn’t really anything really scary about that.

Here is an example of a merge conflict in Git:

image

Double click the conflicting file, and you get the standard diff dialog. You can then resolve the conflict, and then you are done.

time to read 3 min | 444 words

I found myself reading this post, and at some point, I really wanted to cry:

We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.

Let me do the math for a second, okay?

A two terabyte hard drive now costs 120 USD. By my math, that makes:

  • 1 TB = 60 USD
  • 1 GB = 0.058 USD

In other words, that massive saving that they are talking about? 5 cents!

Let me do another math problem, oaky?

Developer costs about 75,000 USD per year.

  • (52 weeks – 2 vacation weeks) x 40 work hours = 2,000 work hours per year.
  • 75,000 / 2,000 = 37.5 $ / hr
  • 37.5 / 60 minutes = 62 cents per minutes.

In other words, assuming that this change cost a single minute of developer time, the entire saving is worse than moot.

And it is going to take a lot more than one minute.

Update: Fixed decimal placement error in the cost per minute. Fixed mute/moot issue.

To those of you pointing out that real server storage space is much higher. You are correct, of course. I am trying to make a point. Even assuming that it costs two orders of magnitudes higher than what I said, that is still only 5$. Are you going to tell me that saving the price of a single cup of coffee is actually meaningful?

To those of you pointing out that MongoDB effectively stores the entire DB in memory. The post talked about disk size, not about memory, but even so, that is still not relevant. Mostly because MongoDB only requires indexes to fit in memory, and (presumably) indexes don't really need to store the field name per each indexed entry. If they do, then there is something very wrong with the impl.

time to read 3 min | 557 words

One of the more powerful features of RavenDB is the notion of indexes. Indexes allow you to query RavenDB efficiently and they stand at the core of RavenDB’s philosophy of “we shall never make you wait” and they are the express way to a lot of RavenDB’s capabilities (spatial searches and full text searches, to name just a few).

Unfortunately, indexes require that you pre-define them before you can use them. That led to a problem. A big one. It meant that in addition for simply deploying RavenDB, you had to manage additional assets, the index definitions. In essence, that is very similar to having to worry about the database schema.  That, in turn, made deploying that bit more complex, and that was unacceptable.

So we implemented the Code Only Indexes, which allow you to define your indexes as part of your project, and have them automatically created the first time the application starts.

Defining the indexes is pretty simple:

public class Movies_ByActor : AbstractIndexCreationTask<Movie>
{
    public Movies_ByActor()
    {
        Map = movies => from movie in movies
                        select new {movie.Name};
        Index(x=>x.Name, FieldIndexing.Analyzed);
    }
}

public class Users_CountByCountry : AbstractIndexCreationTask<User>
{
    public Users_CountByCountry()
    {
        Map = users => from user in users
              select new {user.Country, Count = 1};
        Reduce= results => from result in results
                           group result by result.Country into g
                           select new { Country = g.Key, Count = g.Sum(x=>x.Count)}

    }
}

But this is just the first step, the next is to tell RavenDB about them:

IndexCreation.CreateIndexes(typeof(Movies_ByActor).Assembly, store);

If you put this in your Global.asax, you would never have to think about indexing or deployment issues again.

time to read 3 min | 534 words

One of the pieces of feedback that we got from people frequently enough to be annoying is that the requirement to define indexes upfront before you can query is annoying. For a while, I thought that yes, it is annoying, but so is the need to sleep. And that there isn’t anything much you can do about it.

Then Rob and asked, why do we require users to define indexes when we can gather enough information to do this ourselves? And since I couldn’t think of a good reason why, he went ahead and implemented this.

var activeUsers = from user in session.Query<User>()
                  where user.IsActive == true
                  select user;

You don’t have to define anything, it just works.

There are a few things to note, though.

First, unindexed queries are notorious for working on small data sets and killing systems in production. That was one of the main reasons that I didn’t want to run

Moreover, since RavenDB philosophy is that it should make it pretty hard to shoot yourself in the foot, we figured out how to make this efficient.

  • RavenDB will look at the query, see that there is no matching index, and create one for you.
    • Currently, it will create an index per each query type. In the future (as in, by the time you read this, most probably) we will have a query optimizer that can deal with selecting the appropriate index.
  • Those indexes are going to be temporary indexes, maintained for a short amount of time.
  • However, if RavenDB notice that you are making a large number of queries to a particular index, it will materialize it and turn it into a permanent index.

In other words, based on your actual system behavior, RavenDB will optimize itself for you :-) !

This feature actually had us stop and think, because it fundamentally changed the way that you work with RavenDB. We actually had to stop and think why you would want to create indexes manually.

As it turned out, there are still a number of reasons why you would want to do that, but they become far more rare:

  • You want to do apply complex filtering logic or do something to the index output. (For example, you may be interested in aggregating several fields into one searchable item)
  • You want to use RavenDB’s spatial support.
  • Aggregations still require defining a map/reduce query.
  • You want to use the Live Projections feature, which I’ll discuss in my next post.
time to read 5 min | 839 words

I run into this question, and I thought that this is an important enough topic to put it on the blog as well.

A cursory dig suggests CouchDB is more mature, with a larger community to support it. That aside, what do you consider to be the significant differences?

RavenDB was heavily inspired by CouchDB. But when I sat down to build it, I tried to find all the places where you would have friction in using CouchDB and eliminating them, as well as try to build a product that would be a natural fit to the .NET ecosystem. That isn't just being able to run easily on Windows, btw. It is about being a product that fits the thought processes, requirements and environment in which it is used.

Here are some of the things that distinguish RavenDB:

  • Transactions - support for single document, document batch, multi request, multi node transactions. Include support for DTC. To my knowledge, CouchDB supports transaction only on a single document.
  • Patching - you can perform a PATCH op against a document, instead of having to send the entire document to the server.
  • Set based operations - basically, a way to do things like: "update active = false where last_login < '2010-10-01'"
  • Deployment options - can run embedded, separate executable, windows service, iis, windows azure.
  • Client API - comes with a client API for .NET that is very mature. Supports things like unit of work, change tracking, etc.
  • Safe by default - both the server and the client have builtin limits (overrdable) that prevent you from doing things that will kill your app.
  • Queries - Support the following querying options:
    • Indexes - similar to couch's views. Define by specifying a linq query.
    • Just do a search - doesn't have to have an index. RavenDB will analyze the query and create a temporary index for you. However, unlike couch temp views. This is meant for production use. And those temp indexes will automatically become permanent ones based on usage. Note that you don't have to define anything, just issue the actual query: Name:ayende will give you back the correct result. Tags,Name:raven will also do the same, including when you have to deal with extracting information directly from the composite docs.
    • Run a linq query - this is similar to the way temp view works, it is an O(n) operation, but it allows you to do whatever you want with the full power of linq. (For the non .NET guys, it allows you to run a SQL query against the data store) Mostly meant for testing.
  • Index backing store - Raven puts the index information in Lucene, which means we get full text searching OOTB. We can also do spatial queries OOTB.
  • Searching - It is very easy to say "index users by first name and last name", then search for them by either one. (As I understand it, I would have to define two separate views in couch for this).
  • Scaling - Raven comes with replication builtin, including master/master. Sharding is natively supported by the client API and requires you to simply define your sharding strategy.
  • Authorization - Raven has an auth system that allows defining queries based on user / role on document, set of documents (based on the doc data) and globally. You can define something like: "Only Senior Engineers can Support Strategic Clients"
  • Triggers - Raven gives you the option to register triggers that will run on document PUT/READ/DELETE/INDEX
  • Extensibility - Raven is intended to be customized by the user for typical deployment. That mean that you would typically have some sort of customization, such as triggers, additional operations that the DB can do.
  • Includes  & Live projections -  Let us say that we have the following set of documents: { "name": "ayende", "partner": "docs/123" }, { "name": "arava" }
    • Includes means that you can load the "ayende" document, while asking RavenDB to load the document referred to by the partner property. That means that you have only a single request to make, vs. 2 of them without this feature.
    • Live projections means that we can ask for the document name and the name of the partner's name. Effectively joining the two together.
    • Those two features will only work on local data, obviously.

And I am probably forgetting some stuff, to be honest. Oh, and I am naturally the most unbiased of observers :-)

reDiverse.NET

time to read 2 min | 277 words

This post is a reply for this post, you probably want to read that one first.

Basically, the problem is pretty simple. It is the chicken & the egg problem. There is a set of problems where it doesn’t matter. Rhino Mocks is a good example where it doesn’t really matter how many users there are for the framework. But there are projects where it really does matters.

A package management tool is almost the definition of the chicken & egg problem. Having a tool coming from Microsoft pretty much solve this, because you get a fried chicken pre-prepared.

If you look at other projects, you can see that the result has been interesting.

  • Unity / MEF didn’t have a big impact on the OSS containers.
  • ASP.Net MVC pretty much killed a lot of the interest in MonoRail.
  • Entity Framework had no impact on NHibernate.

In NHibernate’s case, it is mostly because it already moved beyond the chicken & egg problem, I think. In MonoRail’s case, it was that there wasn’t enough outside difference, and most people bet on the MS solution. For Unity / MEF, there wasn’t any push to use something else, because you really didn’t depended on that.

In short, it depends :-)

There are some projects that really need critical mass to succeed. And for those projects, having Microsoft get behind them and push is going to make all the difference in the world.

And no, I don’t really see anything wrong with that.

time to read 2 min | 305 words

There is always a tension between giving users an interface that is simple to use and giving them something that is powerful.

With RavenDB, we run into a problem with the session interface. I mean, just take a look…

image

Look how many operations you can do here! Yes, it is powerful, but it is also very complex to talk about & explain. For that matter, look at the interface:

image

We have 4(!) different ways of querying (actually, we have more, but that is beside the point).

Something had to be done, but we didn’t want to lose any power. So we decided on the following method:

image

Basically, we grouped all the common methods into a super simple session interface. This means that when you need to use the session, you see only:

image

4 of those methods (Equals, GetHashCode, GetType, ToString) are from System.Object and one is from IDisposable. Users pretty much are going to not see those methods.  The CRUD interface is there, as well as the starting point for querying. And that is it.

I think that it makes it very easy to get started with RavenDB. And all those operations that are important but uncommonly used are still there, they just require an explicit step to access them.

time to read 2 min | 240 words

One of the things that I really wanted with to do with RavenDB is to create something that is really easy to use for .NET developers. I think that I have managed to do that. But the one big challenge that we still had was running everything using managed storage.

As you can imagine, building transactional, crash-safe, data stores isn’t particularly easy, but we actually did that, and now RavenDB can run in managed core. That has other implications, like being able to run completely in memory. Which means that you can test your RavenDB code simply by using:

var store = new DocumentStore { RunInMemory = true; }

And just use this as you normally would. For that matter, you can ask the RavenDB server to run in memory as well (extremely useful for demos):

Raven.Server.exe /ram

As a side note, I am going to be posting a lot about the recent storm of features that were just added to RavenDB.

time to read 7 min | 1252 words

This feature has me really excited, because it solves a pretty big problem and it does so in a really elegant fashion.

Let us start from the beginning. Documents are independent, which means that processing a single document should not require loading additional documents. This, in turn, leads to denormalization, so we can keep the data that we need about our associations in the same document.

The problem, of course, is that there are many cases where this denormalization is annoying. In particular, it means that you have to take responsibility for handling the updates to the denormalized data. There are good reasons to want to do that, particularly if you are working on a sharded data store. And yet… many people don’t run in a sharded store, so why make them pay for that?

I found it hard to answer, especially since we already introduced the includes feature. For a while, I thought that just doing the include was enough, but then I got into a discussion with Rob about this. And we had the chance to talk about projections vs. documents. We agreed that we wanted a solution, and we started throwing a lot of crazy ideas (multi sourced, background updated, materialized views – to name one) around. Until we finally realized that we were being really stupid. The data is already there!

It was just that I was too blind to see how we can push it out.

Let us talk in code for a minute, since it would be easier to demonstrate how things work:

using (var s = ds.OpenSession())
{
    var entity = new User { Name = "Ayende" };
    s.Store(entity);
    s.Store(new User { Name = "Oren", AliasId = entity.Id });
    s.SaveChanges();
}

This creates two documents and links between them.

Now, let us say that we want to display the following grid:

User Alias
Oren Ayende

Well, we need to query for all the users that has an alias, then include the associated document. Something like this:

var usersWithAliases = from user in session.Query<User>().Include(x=>x.AliasId)
                       where user.AliasId != null
                       select user;


var results = new List<UserAndAlias>();

foreach(var user in usersWithAliases)
{
    results.Add(
        new UserAndAlias
        {
            User = user.Name
            Alias = session.Load<User>(user.AliasId).Name
        }
    );
}

Here is the deal, this is very efficient in terms of calling the database only once, but it does means that we are passing the full document back, which may be something that we may not want to do.

Not to mention that there is a whole lot of code here.

Okay, so far we have introduced the problem. Let us see how we can solve it. We can do that by applying a live projection at the server side. A live projection transforms the results of a query on the server side, and it has access to other documents as well. Let us see what I mean by that:

public class Users_ByAlias : AbstractIndexCreationTask<User>
{
    public Users_ByAlias()
    {
        Map =
            users => from user in users
                     select new {user.AliasId};

        TransformResults =
            (database, users) => from user in users
                                 let alias= database.Load<User>(user.AliasId)
                                 select new {Name = user.Name, Alias = alias.Name};
    }
}

It is important to understand exactly what is going on here. The TransformResults will be executed on the results on the query, which gives it the change to modify, extend or filter them. In this case, it gives you the ability to look at data from another document.

For the DB guys among you, this performs a nested loop join.

Now, we can just write:

var usersWithAliases = 
     (from user in session.Query<User, Users_ByAlias>()
     where user.AliasId != null
     select user).As<UserAndAlias>();

This will query the index, transform the results on the server side, and give us the UserAndAlias colelction that we can just use.

Did I mention that I am really excited about this feature?

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...