Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,592
|
Comments: 51,223
Privacy Policy · Terms
filter by tags archive
time to read 5 min | 847 words

In my previous post, I discussed some of the problems that you run into when you try to have a single source of truth with regards to an entity definition. The question here is, how do we manage something like a Customer across multiple applications / modules.

For the purpose of discussion, I am going to assume that all of the data is either:

  • All sitting in the same physical database (common if we are talking about different modules in the same application).
  • Spread across multiple databases with some data being replicate to all databases (common if we are talking about different applications).

We will focus on the customer entity as an example, and we will deal with billing and help desk modules / applications. There are some things that everyone can agree on with regards to the customer. Most often, a customer has a id, which is shared across the entire system, as well as some descriptive details, such as a name.

But even things that you would expect to be easily agreed upon aren’t really that easy. For example, what about contact information? The person handling billing at a customer is usually different than the person that we contact for help desk inquires. And that is the stuff that we are supposed to agree on. We have much bigger problems when we have to deal with things like customer’ payment status vs. outstanding helpdesk calls this month.

The way to resolve this is to forget about trying to shove everything into a single entity. Or, to be rather more exact, we need to forget about trying to thing about the Customer entity as a single physical thing. Instead, we are going to have the following:

image

There are several things to note here:

  • There is no inheritance relationship between the different aspect of a customer.
  • We don’t give in and try to put what appears to be shared properties (ContactDetails) in the root Customer. Those details have different meaning for each entity.

There are several ways to handle actually storing this information. If we are using a single database, then we will usually have something like:

image

The advantage of that is that it makes it very easy to actually look at the entire customer entity for debugging purposes. I say for debugging specifically because for production usage, there really isn’t anything that needs to look at the entire thing, every part of the system only care for its own details.

You can easily load the root customer document and your own customer document whenever you need to.

More to the point, because they are different physical things, that solves a lot of the problems that we had with the shared model.

Versioning is not an issue, if billing needs to make a change, they can just go ahead and change things. They don’t need to talk to anyone, because no one else is touching their data.

Concurrency is not an issue, if you make a concurrent modification to billing and help desk, that is not a problem, they are stored into two different locations. That is actually what you want, since it is perfectly all right for having those concurrent changes.

It free us from having to have everyone’s acceptance on any change for everything except on the root document. But as you can probably guess, the amount of information that we put on the root is minimal, precisely to avoid those sort of situations.

This is how we handle things with a shared database, but what is going on when we have multiple applications, with multiple databases?

As you can expect, we are going to have one database which contains all of the definitions of the root Customer (or other entities), and from there we replicate that information to all of the other databases. Why not have them access two databases? Simple, it makes things so much harder. It is easier to have a single database to access to and have replication take care of that.

What about updates in that scenario? Well, updates to the local part is easy, you just do that, but updates to the root customer details have to be handled differently.

The first thing to ask is whatever there really is any need for any of the modules to actually update the root customer details. I can’t see any reason why you would want to do that (billing shouldn’t update the customer name, for example). But even if you have this, the way to handle that is to have a part of the system that is responsible for the root entities database, and have it do the update, from where it will replicate to all of the other databases.

time to read 2 min | 320 words

I was at a customer site, and we were talking about a problem they had with modeling their domain. Actually, we were discussing a proposed solution, a central and definitive definition for all of their entities, so all of the applications could use that.

I had a minor seizure upon hearing that, but after I recovered, I was able to articulate my objections to this approach.

To start with, it breaks the Single Responsibility Principle, the Open Closed Principle and the Interface Segregation Principle. It also makes versioning hard, and introduce a central place where everyone must coordinate with. Think about the number of people that has to be involved whenever you make a change.

Let us take the customer as the representative entity for this discussion. We can all agree that a customer has to have a name, an email and an id. But billing also need to know his credit card information, help desk needs to track what support contracts he has and sales needs to know what sort of products we sold the guy, so we can sell him upgrades.

Now, would you care to be the guy who has to mediate between of all of those different concerns?

And what about changes and updates? Whenever you need to make a change, you have to wait for all of those teams and application to catch up and update and deploy their apps,.

And what about actual usage? You actually don’t want the help desk system to be able to access the billing information, and you most certainly don’t want them to change anything there.

And does it matter if we have concurrent modifications to the entity by both help desk and billing?

All of those things argue very strongly against having a single source of truth about what an entity is. In my next post, I’ll discuss a solution for this problem, Composite Entities.

time to read 1 min | 122 words

This was brought up in the mailing list, and I thought it was an interesting solution, therefor, this post.

image

A couple of things to note here. I would actually rather use the Initialize() / Dispose() methods for this, but the problem is that we don’t really have a way at the Dispose() to know if the action threw an exception. Hence, the need to capture the ExecuteAsync() operation.

For fun, you can also use the async session as well, which will integrate very nicely into the async nature of most of the Web API.

time to read 1 min | 76 words

Sharding allows you to horizontally partition your database. RavenDB includes builtin support for sharding, and in this webinar, we will discuss in detail how to utilize it, how it works and how you can best use it in your own projects

Date: Wednesday, March 28, 2012

Time: 9:00 AM - 10:00 AM PDT

After registering you will receive a confirmation email containing information about joining the Webinar.

Space is limited.
Reserve your Webinar seat now at:
https://www2.gotomeeting.com/register/729216138

time to read 1 min | 169 words

NuGet is a wonderful system, and I am very happy to be able to use and participate in it.

Unfortunately, it has a problem that I don’t know how to solve. In a word, it is a matter of granularity. With RavenDB, we currently have the following packages:

  • RavenDB
  • RavenDB.Embedded
  • RavenDB.Client

The problem is that as we have some features that uses F#, we have some features that uses MVC, we have a debug visualizer for VS, we have… Well, I think you get the point. The problem is that if we split things too granularly, we end up with something like:

  1. RavenDB.Client.FSharp
  2. RavenDB.MvcIntegration
  3. RavenDB.DebugSupport
  4. RavenDB
  5. RavenDB.Core
  6. RavenDB.Embedded
  7. RavenDB.Client
  8. RavenDB.Sharding
  9. RavenDB.NServiceBus
  10. RavenDB.WebApiIntegration
  11. RavenDB.Etl
  12. RavenDB.Replication
  13. RavenDB.IndexReplication
  14. RavenDB.Expiration
  15. RavenDB.MoreLikeThis
  16. RavenDB.Analyzers
  17. RavenDB.Versioning
  18. RavenDB.Authorization
  19. RavenDB.OAuth
  20. RavenDB.CascadeDelete

And there are probably more.

It gets more complex because we don’t really have a good way to make decisions on the type of assemblies that we add to what projects.

As I said, I don’t have an answer, but I would sure appreciate suggestions.

time to read 6 min | 1143 words

Nitpicker corner: Yes, I know about Open Rasta, NancyFX, FubuMVC and all of the other cool frameworks out there. I am not here today to talk about them. I am not interested in talking about them and why they are so much better in this post.

You might be aware that I am doing a lot of stuff at the HTTP level, owing to the fact that both RavenDB and RavenFS are REST based servers.

As such, I had to become intimately familiar with the HTTP spec, how things work at the low level ASP.Net level, etc. I even had to write my own abstractions to be able to run both inside IIS and as a service. Suffice to say, I feel that I have a lot of experience in building HTTP based system. That said, I am also approaching things from a relatively different angle than most people. I am not aiming to build a business application, I am actually building infrastructure servers.

There was a lot of buzz about the ASP.Net Web API, and I took a brief look at the demo, marked it as “nice, need to check it out some day” in my head and moved on. Then I run into a strange problem in RavenFS. RavenFS is a sibling to RavenDB. Whereas RavenDB is a document database, RavenFS is a distributed & replicated file server for (potentially) very large files. (It is currently in beta testing ,and when we are doing giving it all the bells and whistle of a real product, we will show it to the world Smile, it isn’t really important to this post).

What is important is that I run into a problem with RavenFS, and I felt that there was a strong likelihood that I was doing something wrong in the HTTP layer that was causing this. Despite its outward simplicity, HTTP is pretty complex when you get down to business. So I decided to see what would happen if I would replace the HTTP layer for RavenFS with ASP.Net Web API.

That means that I have been using it in anger for the last week, and here is what I think about it so far.

First, it is a beta. That is something that is important to remember, because it means that it isn’t done yet.

Second, I am talking strictly about the server API. I haven’t even touched the client API as of now.

Third, and most important. I am impressed. It is a really clean API, nice interface, well thought out and quite nice to work with.

More than that, I had to do a bunch of stuff that really isn’t trivial. And there are very little docs for it as of now. I was able to do pretty much everything I wanted by just walking the API and figuring things out on my own.

Things that I particularly liked:

  • The API guides you to do the right thing. For example, different headers have different meaning, and you can see that when you look at the different headers collections. You have headers that goes in the response, headers that go in the request, headers that goes for the content, and so on. It really guides you properly to using this as you should.
  • A lot of the stuff that is usually hard is now pretty easy to do. Multi part responses, for example. Ranged requests, or proper routing.
  • I was able to plug in DI for what I was doing in a couple of minutes without really knowing anything about how things work. And I could do that by providing a single delegate, rather than implement a complex interface.
  • I provides support for self hosting, which is crucial for doing things like unit testing the server.
  • It is Async to the core.
  • I really like the ability to return a value, or a task, or a task of a value, or return an HttpResponseMessage which I can customize to my heart content.

Overall, it just makes sense. I get how it works, and it doesn’t seems like I have to fight anything to get things done.

Please note that this is porting a major project to a completely new platform, and doing some really non trivial things in there while doing this.

Things that I didn’t like with it:

Put simply, errors. To be fair, this isn’t a complaint about the standard error handling, this works just fine. The issue is with infrastructure errors.

For example, if you try to push a 5MB request to the server, by default the request will just die. No error message, and the status code is 503 (message unavailable). This can be pretty frustrating to try to figure out, because there is nothing to tell you what the problem is. And I didn’t look at the request size at first. It just seemed that some request worked, and some didn’t. Even after that I found that it is the size that mattered, it was hard to figure out where we need to fix that (and the answer to that is different depending on where you are running!).

Another example is using PUT or DELETE in your requests. As long as you are running in SelfHost, everything will work just fine. If you switch to IIS, you will get an error (405, Method Not Allowed), again, with no idea how to fix this or why this is happening. This is something that you can fix in the config (sometimes), but it is another error that has horrible usability.

Those are going to be pretty common errors, I am guessing, and any error like that is actually a road block for the users. Having an error code and nothing else thrown at you is really frustrating, and this is something that can really use a good error report. Including details about how to fix the problem.

There are a bunch of other issues that I run into (an NRE when misconfiguring the routing that was really confusing and other stuff like that), but this is beta software, and those are things that will be fixed.

The one thing that I miss as a feature is good support for nested resources (/accounts/1/people/2/notes), which can be a great way to provide additional context for the application. What I actually want to use this for is to be able to do things like: /folders <—FoldersController.Get, Post, Put, Delete and then have: /folders/search <- FolderController.GetSearch., PostSearch, etc.

So I can get the routing by http method even when I am doing controller and action calls.

Final thoughts, RavenFS is now completely working using this model, and I like it. It is really nice API, it works, but most importantly, it makes sense.

time to read 4 min | 619 words

I enjoy reading code, and I decided that of a change, I want to read code that isn’t in .NET. The following is a review of Postman:

Postman is a little JavaScript library (well it's actually Coffeescript but the Cakefile handles a build for me) which is similar to a traditional pub/ sub library, just a whole lot smarter.

This is actually the first time that I looked at Coffescript code (beyond cursory glance at the tutorial a time or two). I got to say, it looks pretty neat. Take a look at the definition of  a linked list:

image

Pretty, readable and to the point. I like that.

Then I got to a head scratching piece:

image

There are various things that refer to postie, but it wasn’t until I got to the bottom of the code that I saw:

image

So I guess that postie line is actually defining a null argument, so it can be captured by the class Postman class methods.

I’ll be the first to admit that I am not a JS / CoffeeScript guy, so sometimes I am a little slow to figure things out, this method gave me a pause:

image

It took a while to figure out what is going on there.

The first few lines basically say, skip the first argument and capture the rest, then call all the subscriptions with the new msg.

Note that this is preserving history. So we can do something with this.

There is also an async version of this, confusing called deliverSync.

Getting the notification is done via:

image

This is quite elegant, because it means that you don’t lose out on messages that have already been published.

I guess that you might need to worry about memory usage, but there seems to be some mechanism to sort that out too. So you can explicitly clean things out. Which works well enough, I guess, but I would probably do some sort of builtin limits for how many msgs it can hold at any one time, just to be on the safe side. I don’t actually know how you would debug a memory leak in such a system, but I am guessing it can’t be fun.

image

This code makes my head hurt a big, because of the ability to pass a date or a function. I would rather have an options argument here, rather than overloading the parameter. It might be that I am a bad JS / CoffeScript coder and try to impose standards of behavior from C#, though.

All in all, this seems to be a fairly nice system, there is a test suite that is quite readable, and it is a fun codebase to read.

time to read 3 min | 490 words

This is a review of the S#arp Lite project, the version from Nov 4, 2011.

This project is significantly better than the S#arp Arch project that I reviewed a while ago, but that doesn’t mean that it is good. There is a lot to like, but frankly, the insistence to again abstract the data access behind complex base classes and repositories makes things much harder in the longer run.

If you are writing an application and you find yourself writing abstractions on top of CUD operations, stop, you are doing it wrong.

I quite like S#arp approach for querying, though. You expose things directly, and if it is ugly, you just wrap it in a dedicated query object. That is how you should be handling things.

Finally, whenever possible, push things to the infrastructure, it is usually pretty good and that is the right level of handling things like persistence, validation, etc. And no, you don’t have to write that, it is already there.

A lot of the code in the sample project was simply to manage persistence and validation (in fact, there was an entire project for that) that could be safely deleted in favor of:

public class ValidationListener : NHibernate.Event.IPreUpdateEventListener, NHibernate.Event.IPreInsertEventListener
{
    public bool OnPreUpdate(PreUpdateEvent @event)
    {
        if (!DataAnnotationsValidator.TryValidate(@event.Entity)) 
            throw new InvalidOperationException("Updated entity is in an invalid state");

        return false;
    }

    public bool OnPreInsert(PreInsertEvent @event)
    {
        if (!DataAnnotationsValidator.TryValidate(@event.Entity))
            throw new InvalidOperationException("Updated entity is in an invalid state");

        return false;
    }
}

Register that with NHibernate, and it will do that validation work for you, for example. Don’t try too hard, it should be simple, if it ain’t, you are either doing something very strange or you are doing it wrong, and I am willing to bet on the later.

To be clear, the problems that I had with the codebase were mostly with regards to the data access portions. I didn’t have any issues with the rest of the architecture.

time to read 2 min | 286 words

This is a review of the S#arp Lite project, the version from Nov 4, 2011.

Okay, after going over all of the rest of the application, let us take a look a the parts that actually do something.

the following are from the CustomerController:

image

It is fairly straight forward, all in all. Of course, the problem is that is isn’t doing much. The moment that it does, we are going to run into problems. Let us move a different controller, ProductController and the Index action:

image

Seems fine, right? Except that in the view…

image

As you can see, we got a Select N+1 here. I’ll admit, I actually had to spend a moment or two to look for it (hint, look for @foreach  in the view, that is usually an indication of a place that requires attention).

The problem is that we really don’t have anything to do about it. If we want to resolve this, we would have to create our own query object to completely encapsulate the query. But all we need is to just add a FetchMany and we are done, except that there is that nasty OR/M abstraction that doesn’t do much except make our life harder.

FUTURE POSTS

  1. Semantic image search in RavenDB - about one day from now

There are posts all the way to Jul 28, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}