Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,640
|
Comments: 51,260
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 383 words

This one was a real pain to figure out. Can you imagine what would be the result of this code?

   1: var queue = new MessageQueue(queuePath);
   2: queue.Dispose();
   3: var peekAsyncResult = queue.BeginPeek();
   4: peekAsyncResult.AsyncWaitHandle.WaitOne();

If you guessed that we would get ObjectDisposedException, you are sadly mistaken. If you guessed that this would lead to a deadlock, you won.

Figuring out the behavior in a multi threaded system where one thread was beginning to listen and another was disposing the queue and waiting for pending operations to complete is… not fun.

Update: For some strange reason, I am not able to reproduce the problem shown above. I know that I did before I posted this, but I posted it as one of the last things that I did that day. I think that this is somehow related to the actual queue used and whatever or not it has messages.

time to read 10 min | 1931 words

I got into a discussion today about how we are dealing with concurrency, and I have had a few good examples that I think worth putting in writing. The first of them is the phone billing system. This is, by nature, a distributed and concurrent system, and it is pretty easy to understand, I think.

We store the billing information for each customer (keyed by the phone number) in the DHT. The initial state looks like this:

image

The balance is what the account has, the call & SMS are the actions on the account. For the purpose of discussion, sending SMS costs 2$ and 1 minute call cost 5$.

And then the following happens. A phone call is made at the same time that a couple of SMSes is sent and a bill is paid. You can see that in the following picture:

image

Each of those actions are handled by a different node. We will deal with them in sequence, because writing parallel hard be is.

A phone call is made, so we need to record that it happened. We get the current billing information from the DHT and add a new action:

image

At the same time, we also send a couple of SMS messages. Again, we get the current billing information (and we get version 42), add the action and saving it back. However, we don’t have the most current version, so the DHT accepts the update and now we have two versions for key 555-5421. This is expected and normal behavior.

image

You should also note that we have an overdraft charge, for going over our account balance.This is something that was added to the account as part of the business logic of processing those the call. Being a responsible adult, the bill is paid at the exact time to avoid an overdraft charge. That one is handled according to the same approach, get the billing information from the DHT (and again we get version 42), modify it and save.

Now we have the following situation:

image All three are valid, I have to say. When we ask the DHT to get a value by key, we will get all three versions back into a coherent vision of what actually happened.

First, I should mention that this is not a generic solution for all problems. There are likely to be problems that you’ll not be able to resolve using this approach.

One thing that you might have noticed is that each of the items is tagged with a number. In real life, it would be a guid, but no one can remember a guid by looking at it, so I made it a number that is easy to remember. This id can uniquely identify an item across multi machines and concurrent versions.

The algorithm for merging those three versions together is actually quite simple. It goes something like this:

   1: public BillingStatementState Merge(BillingStatementState[] states)
   2: {
   3:     var mergedState = new BillingStatementState();
   4:  
   5:     foreach (var balanceItem in states.SelectMany(b=>b.Balances)
   6:     {
   7:         if(mergedState.HasBalanceItem(balanceItem.Id) == false)
   8:             mergedState.AddBalanceItem(balanceItem);
   9:     }     
  10:  
  11:     foreach (var item in states.SelectMany(s=>s.ActionItems))
  12:     {
  13:        if(mergedState.HasActionItem(item.Id))
  14:             continue;
  15:         mergedState.AddActionItem(item);
  16:     }
  17:  
  18:     mergedState.RecalculcateCharges();
  19:  
  20:     return mergedState;
  21: }
  22:  

RecalcuateCharges is responsible to add / remove overdraft charges based on the new information.

What we are basically doing is quite simple, we copy all the new information to the new state, and we know that it is new because we have a unique id that can identify each item. The only remaining bit of complexity is that we now need to recalculate the charges.

As you’ll see in a future post, “recalculating” isn’t really it, you usually have to perform some compensating actions as well, but that is beside the point for now.

Given the above code, we can safely merge the three versions, and make them into a single big version.

image

The DHT will notice that the new value is the child of all current valid versions, accept the update and remove all other versions.

As I said, it is not something that can fit any scenario, but it can fit a surprisingly wide area of them.

time to read 5 min | 962 words

It continues to amaze me, the length some people will go to in order to add additional complexity. Let us take this article: Two-Tier Service Application Scenario (REST). I will leave aside the arguments about this article gross misrepresentations of what terms like domain modeling, entities and REST. Greg already called the DDD argument, and Colin is working on the problems with the representation of REST.

What I want to talk about is friction. I am looking at the code that is shown in the article and I cringed. Hard. Do you want to tell me that it is recommended that people would write code like this?

image

I am assuming that I would need to write one of those for each of my “entities” I honestly cannot figure out why. Why create specialized behavior when it would be easier, simpler, cheaper and better to handle this generically?

Is there any sort of value in this code? I don’t think so.

It gets better when you see what you need to do in your Application_Start.

image

So now, not only do I have to create those handlers by hand, I now need to take care to register each them. I’ll leave aside the bug in the routing code vs. the employee route handler code (‘employee’ is not a route value), to focus on a more important subject. Even if I decided that I want to code my way to insanity, why do I give a single class two responsibilities (creating EmployeeHandler and EmployeesHandler).

I am not even trying to ask why I need the route handler in the first place. It seems like it is there just so there would be another layer in the architecture diagram. It looks like that was a common requirement, because here comes the next step toward the road of useless code:

image

Except for creating additional work for the developer, I cannot think of a single reason why I would want to write this type of code unless I was paid by the line count.

Let me see how many things I can find here that are going to add friction:

  • As written, you are going to need one of those for each of the “entities” that you have. I assume that this is so that all the other per “entity” types wouldn’t feel lonely. On last count, We had:
    • EmployeeRepository
    • EmployeeHandler
    • EmployeesHandler
    • EmployeeRouteHandler
    • EmployeeTranslator
    • EmployeeScript
    • EmployeeFacade
  • Mapping “command” whatever that is, to method calls? So every time that I have to add a new method, I have to touch how many places?
  • Why on earth do I need to do explicit error handling? That is why I have exceptions for. Those should do the Right Thing and I should not have to explicitly manage errors unless I know about something specific happening.

Oh, and I saved the best for last. Please take a look at the beast. And unlike the story, there is no beauty.

image

I truly find it hard to find a place to start.

  • Magic numbers all over the place.
  • Promote(object[] data) ?! Are we in the stone age again? I really hoped that by 2009 we would be able to get to grip with the notion of meaningful method parameters! For crying out load, you can use the ASP.Net MVC binder to do the work, you don’t have to do it yourself.
  • Null reference exception that are just waiting to happen.
  • Unless PositionEnum is not an enum (a WTF all on its own), then the code wouldn’t even compile! Enums are value types, you cannot use ‘as’ with them.
  • busErrors ?
    • First of all, what bus?
    • More importantly, are we back in the good ole days of return codes? I thought we were beyond that already!
  • Really bad resource management. C# has try/catch/finally for a reason. If an exception is thrown, you are going to leak the transactions. This is truly sad since the text is very careful to point out that you MUST dispose of those resources before you return.

As I said, I am not going to even approach the actual guidance that is offered there. I think that it is invalid as best and likely to be harmful.

From the code sample shown, I would surmise that no one actually sat down and actually coded any sort of system with this. Even the most basic system would crumble under the sheer weight of architecture piled on top of the poor system.

I am saddened and disappointed to see such a thing being presented as guidance from the P&P.

time to read 1 min | 157 words

After a lot more study, it looks like there are two separate issues that are causing the problem here.

  1. During AppDomain unload, it is permissible for the GC to collect reachable objects. I am fine with that and I certainly agree that this makes sense.
  2. Application_End occurs concurrently with the AppDomain unload.

Looking at the docs (and there are surprisingly few about this), it seems like 1 is expected, but 2 is a bug. The docs state:

Application_End  - Called once per lifetime of the application before the application is unloaded.

It may be my English, but to me before doesn’t suggest at the same time as.

So I do believe it is a bug in the framework, but not in the GC, it is in the shutdown code for ASP.Net. This is still a very nasty problem.

time to read 3 min | 437 words

imageI just spent several hours tracking down a crashing but in my current project.

The  issue was, quite clearly, a problem with releasing unmanaged resources. So I tightened my control over resources and made absolutely sure that I am releasing everything properly.

I simply could not believe what was going on. I knew what they code is doing, and I knew that what I was getting was flat out impossible.

Yes, I know that we keep saying that, but this bug really is not possible!

The situation is quite clear, during the shutdown process of the application, an unmanaged resource’s finalizer would throw an exception because it wasn’t properly disposed.

The problem? It is most assuredly should be disposed. When debugging through the problem, I found out something extremely strange and worrying. The managed object’s finalizer was running while there were strong references to the object.

You can see that in the attached screenshot (click the image see it in full size).

That is, by the way, when you have the root in a static field, so it cannot be that the whole graph is free.

This is somehow related to threading, because debugging this would often change the way this works, but running without a debugger consistently fails.

The CLR semantics for finalizers clearly state that they can only be run after there are no more strong references to the instance. Cleary, this is not what is going on here.

The only thing that I can think of that can affect this is that there is something really strange going on with app domain unloads.

Now, I can’t figure if this is me being extremely stupid or if this is a real problem. I did manage to create a reproduction of the issue, however, which you can download here.

This is on VMWare Fusion, running Windows 2008 x64, with .Net 3.5 SP1.

To reproduce, start the application in WebDev.WebServer, wait for the page to load, and the close the WebDev.WebServer. If it crashes, you have successfully reproduced the problem.

Update – This is really interesting. both stack traces are operating on the same object, by the way.

image

Adding locking around the finalizer and dispose seems to have made the problem go away.

time to read 21 min | 4061 words

Concurrency is a tough topic, fraught with problems, pitfalls and nasty issues. This is especially the case when you try to build distributed, inherently parallel systems. I am dealing with the topic quite a lot recently and I have create several solutions (none of them are originally mine, mind you).

There aren’t that many good solutions our there, most of them boil down to: “suck it up and deal with the complexity.” In this case, I want to try to deal with the complexity in a consistent fashion ( no one off solutions ) and in a way that I can deal without first meditating on the import of socks.

Let us see if I can come up with a good example. We have a saga that we use to check whatever a particular user has acceptable credit to buy something from us. The logic is that we need to verify with at least 2 credit card bureaus, and the average must be over 700. (This logic has nothing to do with the real world, since I just dreamt it up, by the way). Here is a simple implementation of a saga that can deal with those requirements:

   1: public class AccpetableCreditSaga : ISaga<AccpetableCreditState>,
   2:   InitiatedBy<IsAcceptableAsCustomer>,
   3:   Orchestrates<CreditCardScore>, 
   4:   Orchestrates<MergeSagaState>
   5: {
   6:   IServiceBus bus;
   7:   public bool IsCompleted {get;set;}
   8:   public Guid Id {get;set;}
   9:   
  10:   public AccpetableCreditSaga (IServiceBus bus)
  11:   {
  12:     this.bus = bus;
  13:   }
  14:   
  15:   public void Consume(IsAcceptableAsCustomer message)
  16:   {
  17:     bus.Send(
  18:       new Equifax.CheckCreditFor{Card = message.Card),
  19:       new Experian.CheckCreditFor{Card = message.Card),
  20:       new TransUnion.CheckCreditFor{Card = message.Card)
  21:       );
  22:   }
  23:   
  24:   public void Consume(CreditCardScore message)
  25:   {
  26:     State.Scores.Add(message);
  27:     
  28:     TryCompleteSaga();
  29:   }
  30:   
  31:   public void Consume(MergeSagaState message)
  32:   {
  33:     TryCompleteSaga();
  34:   }
  35:   
  36:   public void TryCompleteSaga()
  37:   {
  38:     if(State.Scores.Count <2)
  39:       return;
  40:      
  41:      bus.Publish(new CreditScoreAcceptable
  42:      {
  43:       CorrelationId = Id,
  44:       IsAcceptable = State.Scores.Average(x=>x.Score) > 700
  45:      });
  46:      IsCompleted = true;
  47:   }
  48: }

We have this strange MergeSagaState message, but other than that, it should be pretty obvious what is going on in here.It should be equally obvious that we have a serious problem here. Let us say that we get two reply messages with credit card scores, at the same time. We will create two instances of the saga that will run in parallel, each of them getting a copy of the saga’s state. But, the end result is that processing those messages doesn’t match the end condition for the saga. So even though in practice we have gotten all the messages we need, because we handled them in parallel, we had no chance to actually see both changes at the same time. This means that any logic that we have that requires us to have a full picture of what is going on isn’t going to work.

Rhino Service Bus solve the issue by putting the saga’s state into Rhino DHT. This means that a single saga may have several states at the same time. Merging them together is also something that the bus will take care off. Merging the different parts is inherently an issue that cannot be solved generically. There is no generic merge algorithm that you can use. Rhino Service Bus define an interface that will allow you to deal with this issue in a clean manner and supply whatever business logic is required to merge difference versions.

Here is an example of how we can merge the different versions together:

   1: public class AccpetableCreditStateMerger : ISagaStateMerger<AccpetableCreditState>
   2: {
   3:   public AccpetableCreditState Merge(AccpetableCreditState[] states)
   4:   {
   5:     return new AccpetableCreditState
   6:     {
   7:       SCores = states.SelectMany(x=>x.Scores)
   8:         .GroupBy(x=>x.Bureau)
   9:         .Select(x => new Score
  10:         {
  11:           Bureau = x.Key,
  12:           Score = x.Max(y=>y.Score)
  13:         }).ToList();
  14:     };
  15:   }
  16: }

Note that this is notepad code, so it may contain errors, but the actual intention should be clear. We accept an array of states that need to be merged, find the highest score from each bureau and return the merged state.

whenever Rhino Service Bus detects that the saga is in a conflicted state, it will post a MergeSagaState message to the saga. This will merge the saga’s state and call the Consume(MergeSagaState), in which the saga gets to decide what it wants to do about this (usually inspect the state to see if we missed anything). This also works for completing a saga, by the way, you cannot complete a saga in an inconsistent state, you will get called again with Consume(MergeSagaSate) to deal with that.

The state merger is also a good place to try to deal with concurrency compensating actions. If we notice in the merger that we perform some action twice and we need to revert one of them, for example. In general, it is better to be able to avoid having to do so, but that is the place for this logic.

time to read 5 min | 918 words

I have talked about Rhino DHT at length, and the versioning story that exists there. What I haven’t talked about is why I built it. Or, to be rather more exact, the actual use case that I had in mind.

Jason Diamond had pointed out a problem with the way sagas work with Rhino Service Bus.

Are BaristaSaga objects instantiated per message? If so, can two different instances be consuming different messages concurrently?

The reason I ask is because it looks like handling the PrepareDrink message could take some time. Is it possible that a PaymentComplete message could come in before the PrepareDrink message is finished being handled?

If the two instances of BaristaSaga have their own instance of BaristaState, I can see the GotPayment value set by handling the PaymentComplete message getting lost.

If the two instances of BaristaSaga share the same instance of BaristaState, do I now have to worry about synchronizing changes to the state across all of the sagas? Also, wouldn't this prevent having multiple barista "servers" handling messages since they wouldn't be able to share instances across processes/machines.

The answer to that is that yes, a saga can execute concurrently. Not only that, but it can execute concurrently on different machines. That put us in somewhat of a problem regarding consistent state.

There are several options that we can use to resolve the issue. One of them is to ensure that this cannot happen by locking on a shared resource when executing the saga (commonly done by opening a transaction on the saga’s row). That can significantly limit the system scalability. Another option is to persist the saga’s state in a way that ensure that we have no conflicts. One way of doing that is to persist the actual state change itself, which allow us to replay the object to a consistent state. Concurrent updates don’t bother us because we aren’t actually modifying the data.

That might require some careful thinking, however, to avoid a case where a saga tat is concurrently executing step on its own feet without paying attention. I strongly dislike anything that require careful thinking. It is like saying that C++’s has no memory leaks issues, it just require some careful thinking.

For RSB, I wanted to be able to do better than that. I selected Rhino DHT at persistence store for the default saga’s state (you can still do other things, of course). That means that concurrency is very explicit. If you got to a point where there were two concurrently executing instances of the saga, their state is going to go to Rhino DHT. Since they are both going to be from the same version, Rhino DHT is going to keep both state changes around.

The next time that we need the state for that particular saga, we are actually going to get both states. At that point, we introduce the ISagaStateMerger:

   1: public interface ISagaStateMerger<TState>
   2:     where TState : IVersionedSagaState
   3: {
   4:     TState Merge(TState[] states);
   5: }

This allow us to handle the notion of concurrency resolution in a very explicit manner. We get the appropriate state merger from the container and use that to merge the states back to a consistent state, which we then pass to the saga to continue its execution.

There is just one additional twist. A saga cannot complete until it is in a consistent state, so if the saga completes while it is in an inconsistent state, we will call the saga again (after resolving the conflict) and let it handle the final state conflict before perform the actual completion.

time to read 7 min | 1326 words

Yes, I am aware that I said that I would only have two more feature for NH Prof before releasing. But I am currently being held hostage by the new features fairy, and negotiations over a feature freeze seems to have gotten to a stand still. Beside, it is a neat feature.

The actual feature is quite simple. Let us say that we have the following model:

image 

Notice that this is a very common case of bidirectional association, and this is mapped to the following table model:

image

Notice that while on the object model this is a bidirectional association and is maintained by two different places, it is maintained on a single place in the database.

This is a a very common case, and quite a few people get it wrong. By default, NHibernate has to assume that it must update the column on both sides, so creating a new post and adding it to the Blog’s Posts collection will result in two statements being written to the database:

   1: INSERT INTO Posts(Title,
   2:                   Text,
   3:                   PostedAt,
   4:                   BlogId,
   5:                   UserId)
   6: VALUES     ('vam' /* @p0 */,
   7:             'abc' /* @p1 */,
   8:             '1/17/2009 5:28:52 PM' /* @p2 */,
   9:             1 /* @p3 */,
  10:             1 /* @p4 */);
  11: select SCOPE_IDENTITY ( )
  12:  
  13: UPDATE Posts
  14: SET    BlogId = 1 /* @p0_0 */
  15: WHERE  Id = 22 /* @p1_0 */

As you can see, we are actually setting the BlogId to the same value, twice.

Now, there is a very easy fix for this issue, all you have to do is to tell NHibernate on the Blog’s Posts mapping that this is a collection where the responsibility for actually updating the column value is on the other side. This is also something that I tend to check in code reviews quite often. The fix is literally just specifying inverse=’true’ on the <many-to-one> association.

And now NH Prof will detect and warn about such cased:

image

Beautiful!

This is also the first case in which I am starting to do much more in depth analysis of what is actually going on with NHibernate. I planned to do this sort of thing after the v1.0 release, but as I said, I am held hostage by the new features fairy, and this is my negotiation technique :-)

time to read 1 min | 78 words

I am working with a client at the moment to upgrade some of the techniques and processes that they have. Yesterday I realize something very important:

The most advanced, radical, game changing technology that I introduced to this client is… static HTML files.

And yes, I am talking about .html files that go through absolutely no server side processing. And yes, I am serious, and no, I don’t think that I’ll explain.

time to read 2 min | 384 words

The most requested feature for NH Prof is the ability to view the query results inline. I created a mockup just to show how this might look like. The real discussion is below the image.

image

On the face of it, this feature looks very simple. Query the database, throw the results into a grid, done.  It is actually not that simple. Let me enumerate all the things that complicate this feature.

The simplest thing is a big stumbling block. How do I get the connection string?

I can make the profiler figure out what the connection string is on the profiled application. That is a relatively simple task from technical perspective. It also open up a whole can of worms. Connection strings are a considered to be highly valuable, and getting the connection string from a running application is a big No! from the security & regulatory perspective.

And that is assuming that I can actually make use of the connection string. In many cases, the connection string point to a database that is not accessible from a developer machine, or not accessible as the database user.

So we remove the “auto detect connection string” feature and move to the next issue. I am not supporting just SQL Server. I am aiming to support all major databases that NHibernate support. But you cannot connect to a MySQL database (to take a non random example) without referencing MySQL’s dlls. Am I suppose to take a dependency on all possible drivers? For that matter, I simply cannot distribute those drivers with NH Prof. MySQL’s drivers are GPL. As such, they are inherently incompatible with commercial software.

So now I need to dynamically discover and load dlls. And that probably involves some sort of UI to search for them, and error handling for all sorts of interesting problems.

Then we have additional issues. What happen if you try to see the results of a DELETE statement? Or execute some sensitive stored procedure.

None of those issues are impossible to resolve, they are just things that complicate a relatively simple feature and make it much more complex.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...