Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,565
|
Comments: 51,184
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 376 words

Design Patterns: Elements of Reusable Object-Oriented Software

Amazon tells me that I purchased this book in Sep 2004, and have since then misplaced it, for some reason. I remember how important this book was to shaping how I thought about software. For the first time, I actually have the words discuss what I was doing, and proven pathways to success. Of course, we all know that… it didn’t end up being quite so good.

In particular, it led to Cargo Cult Programming. From my perspective, it looks like a lot of people made the assumption that their application is good because it has design patterns, not because design patterns will result in simpler code.

Now, this book came out in 1994. And quite a bit have changed in the world of software since that time. In this series, I am going to take a look at all those design patterns and see how they hold up in the test of time. Remember, the design patterns were written at a time where most software was single user client applications (think Win Forms, then reduce by nine orders of magnitude), no web or internet, no multi threading, very little networking, very slow upgrade cycles and far slower machines. None of those assumptions are relevant to how we build software today, but they form the core of the environment that was relevant when the book was written. I think that it is going to be interesting to see how those things hold up.

And because I know the nitpickers, let me setup the context. We are talking about building design patterns within the context of .NET application, aimed at either system software (like RavenDB) or enterprise applications. This is the context in which I am talking about. Bringing arguments / options from additional contexts is no relevant to the discussion.

I am also not going to discuss the patterns themselves at any depth, if you want that, go and read the book, it is still a very good one, even though it came out almost 20 years ago.

time to read 1 min | 180 words

Well, we worked quite a bit on that, but the Uber Prof (NHibernate Profiler, Entity Framework Profiler, Linq to SQL Profiler, etc) version 2.0 are now out for public beta.

We made a lot of improvements. Including performance, stability and responsiveness, but probably the most important thing from the user perspective is that we now support running the profiler in production, and even on the cloud.

We will have the full listing of all the new goodies up on the company site soon, including detailed instructions on how to enable production profiling and on cloud profiling, but I just couldn’t wait to break the news to you.

In fact, along with V2.0 of the profilers, we have a brand new site for our company, which you can check here: http://hibernatingrhinos.com/.

To celebrate the fact that we are going on beta, we also offer a 20% discount for the duration of the beta.

Nitpicker corner, please remember that this is a beta, there are bound to be problems, and we will fix them as soon as we can.

time to read 4 min | 631 words

In this post, I want to talk about libraries that want or need to not only support being run in multiple threads, but actually want to use multiple threads themselves. Remember, you are a library, not a framework. You are a guest in someone’s else home, and you shouldn’t litter.

The first thing to remember is error handling. That actually comes in two parts. First, unhandled exceptions from a thread will kill the application. There are very few things that people will find more annoying with your library than your errors killing their application. Second, and almost as important, you should have a way to report those errors.

Even more annoying than killing my application, failing to do something silently and in a way that is really hard to debug is going to cause major hair loss all around.

There are several scenarios that we need to consider:

  • Long running threads – I need to do something in a background thread that would usually live as long as the application itself.
  • Short term threads – I need to do something that requires a lot of threads, just for a short time.
  • Timeouts / delays / expirations – I need to do something every X amount of time.

In the first case, of long running threads, there isn’t much that can be done. You want to handle errors, obviously, and you want to make it crystal clear when you spin up your threads, and when / how you tear them down again. Another important aspect is that you should name your threads. This is important because it means that when debugging things, we can figure out what this or that thread is doing more easily.

The next approach is much more common, you just need some way to execute some code in parallel. The easiest thing to do is to go to new Thread(), ThreadPool.QueueUserWorkItem or Task.Factory.StartNew(). Such, this is easy to do, and it is also perfectly wrong.

Why is that, you say?

Quite simply, it ain’t your app. You don’t get to make such decisions for the application that is hosting your library. Maybe the app needs to conserve threads to serve requests? Maybe it is trying to utilize less threads to reduce CPU load and save power on a laptop running on batteries? Maybe they are trying to debug something and all those threads popping around is driving them crazy?

The polite thing to do when you recognize that you have a threading requirement in your application is to:

  • Give the user a way to control that.
  • Provide a default implementation that works.

A good example of that can be seen in RavenDB’s sharding implementation.

public interface IShardAccessStrategy
{
    event ShardingErrorHandle<IDatabaseCommands> OnError;

    T[] Apply<T>(IList<IDatabaseCommands> commands, ShardRequestData request, Func<IDatabaseCommands, int, T> operation);
}

As you can see, we abstracted the notion of making multiple requests. We provide you out of the box with sequential and parallel implementations for this.

The last item, timeouts /expirations / delays is also something that you want to give the user of your library control of. Ideally, using something like the strategy above. By all means, make a default implementation and wire it without needing anything.

But it is important to have control over those things. The expert users for your library will want and need it.

time to read 3 min | 419 words

Next on the agenda for writing correct multi threaded libraries, how do you handle shared state?

The easiest way to handle that is to use the same approach that NHibernate and the RavenDB Client API uses. You have a factory / builder / fizzy object that you use to construct all of your state, this is done on a single thread, and then you call a method that effectively “freeze” this state from now on.

All future accesses to this state are read only. This is really good for doing things like reflection lookups, loading configuration, etc.

But what happens when you actually need shared mutable state? A common example is a cache, or global statistics. This is where you actually need to pull out your copy of Concurrent Programming on Windows and very carefully write true multi threaded code.

It is over a thousand pages, you say? Sure, and you need to know all of this crap to get multi threading working properly. Multi threading is scary, hard and should not be used.

In general, even if you actually need to do shared mutable state, you really want to make sure that there are clear definitions between things that can be shared among multiple threads and the things that cannot. And you want to make most of the work in the parts where you don’t have to worry about multi threading.

It also means that your users have much easier time figuring out what the expected behavior of the system is. This is very important with the advent of C# 5.0, since async API are going to be a lot more common. Sure, you use the underlying async primitives, but did you consider what may happen when you are issuing multiple concurrent async requests. Is that allowed?

With C# 5.0, you can usually treat async code as if it was single threaded, but that breaks down if you are allowing multiple concurrent async operations.

In RavenDB and NHibernate, we use the notion of Document Store / Session Factory – which are created once, safe for multi threading and are usually singletons. And then we have the notion of sessions, which are single threaded, easy & cheap to create and follow the notion of one per thread (actually, one per work unit, but that is beside the point).

On my next post, I’ll discuss what happens when your library actually wants to go beyond just being safe for multi threading, when the library wants to use threading directly.

time to read 2 min | 274 words

The major difference between libraries and frameworks is that a framework is something that runs your code, and is in general in control of its own environment and a library is something that you use in your own code, where you control the environment.

Examples for frameworks: ASP.Net, NServiceBus, WPF, etc.

Examples for libraries: NHibernate, RavenDB Client API, JSON.Net, SharpPDF, etc.

Why am I talking about the distinction between frameworks and libraries in a post about multi threaded design?

Simple, there are vastly different rules for multi threaded design with frameworks and libraries. In general, frameworks manage their own threads, and will let your code use one of their threads. On the other hands, libraries will use your own threads.

The simple rule for multi threaded design for libraries? Just don’t do it.

Multi threading is hard, and you are going to cause issues for people if you don’t know exactly what you are doing. Therefor, just write for a single threaded application and make sure to hold no shared state.

For example, JSON.Net pretty much does this. The sole place where it does do multi threading is where it is handling caching, and it must be doing this really well because I never paid it any mind and we got no error reports about it.

But the easiest thing to do is to just not support multi threading for your objects. If the user want to use the code from multiple threads, he is welcome to instantiate multiple instances and use one per thread.

In my next post, I’ll talk about what happens when you actually do need to hold some shared state.

time to read 2 min | 254 words

A common complaint from users with RavenDB 1.0 has been that we release too often, which meant that to upgrade RavenDB too often.

We made a clean break with 1.2 (which allowed handle several very large issues cleanly) and we have been releasing unstable version at a steady rate of 5 – 8 builds a week.  Now, of course, we get complaints about not releasing enough Smile.

At any rate, the following is a rough plan for where we will be in the near future. RavenDB 1.2 is currently running in production (running our internal infrastructure) and we have been doing a lot of comparability, stability and performance work. There are still a lot of little tweaks that we still want to put into the product, but major features are likely to be deferred post 1.2

We have just completed triage of the things that we intend to do for 1.2, and we currently have ~60 - 70 items to go through in terms of features to implement, bug fixes to do, etc. Toward the end of this month, we intend to stop all new feature development and focus on bug fixes, stability and performance. 

By mid Nov, we want to have an RC version out that you can take take out to town, and the release is scheduled to late Nov or early Dec. Post 1.2 release, we will go back to a stable release every 4 - 6 weeks.

time to read 1 min | 191 words

Greg Young has a comment on my Rhino Events post that deserves to be read in full. Go ahead, read it, I’ll wait.

Since you didn’t, I’ll summarize. Greg points out numerous faults and issues that aren’t handled or could be handled better in the code.

That is excellent, from my point of view, if only because it gives me more stuff to think about for the next time.

But the most important thing to note here is that Greg is absolutely correct about something:

I have always said an event store is a fun project because you can go anywhere from an afternoon to years on an implementation.

Rhino Events is a fun project, and I’ve learned some stuff there that I’ll likely use again letter on. But above everything else, this is not production worthy code .It is just some fun code that I liked. You may take and do whatever you like with it, but mostly I was concerned with finding the right ways to actually get things done, and not considering all of the issues that might arise in a real production environment.

time to read 3 min | 529 words

After talking so often about how much I consider OSS work to be indicative of passion, I got bummed when I realized that I didn’t actually did any OSS work for a while, if you exclude RavenDB.

I was recently at lunch at a client, when something he said triggered a bunch of ideas in my head. I am afraid that I made for poor lunch conversation, because all I could see in my head was code and IO blocks moving around in interesting ways.

At any rate, I sat down at the end of the day and wrote a few spikes, then I decided to write the actual thing in a way that would actually be useful.

What is Rhino Events?

It is a small .NET library that gives you embeddable event store. Also, it is freakishly fast.

How fast is that?

image

Well, this code writes a fairly non trivial events 10,000,000 (that is ten million times) to disk.

It does this at a rate of about 60,000 events per second. And that include the full life cycle (serializing the data, flushing to disk, etc).

Rhino.Events has the following external API:

image

As you can see, we have several ways of writing events to disk, always associating to a stream, or just writing the latest snapshot.

Note that the write methods actually return a Task. You can ignore that Task, if you wish, but this is part of how Rhino Events gets to be so fast.

When you call EnqueueEventAsync, we register the value in a queue and have a background process write all of the pending events to disk. This means that we actually have only one thread that is actually doing writes, which means that we can batch all of those writes to get really nice performance from being able to handle all of that.

We can also reduce on the number of times that we have to actually Flush to disk (fsync), so we only do that when we run out of things to write or at a predefined times (usually after a full 200 ms of non stop writes. Only after the information was fully flushed to disk will we set the task status to completed.

This is actually a really interesting approach from my point of view, and it makes the entire thing transactional, in the sense that you can wait to be sure that the event has been persisted to disk (and yes, Rhino Events is fully ACID) or you can fire & forget it, and move on with your life.

A few words before I let you go off and play with the bits.

This is a Rhino project, which means that it is a fully OSS one. You can take the code and do pretty much whatever you want with it. But I , or Hibernating Rhinos, will not be providing support for that.

You can get the bits here: https://github.com/ayende/Rhino.Events

time to read 15 min | 2926 words

This post came out of a stack overflow question. The user had the following code:

   1: public void StoreUser(User user)
   2: {
   3:     //Some validation logic
   4:     if(string.IsNullOrWhiteSpace(user.Name))
   5:         throw new Exception("User name can not be empty");
   6:  
   7:     Session.Store(user);
   8: }

But he noted that this will not work for other approaches, such as this:

   1: var u1 = Sesion.Load<User>(1);
   2: u1.Name = null; //change is tracked and will persist on the next save changes
   3: Session.SaveChanges();

This is because RavenDB tracks the entity and will persist it if there has been any changes when SaveChanges is called.

The question was:

Is there someway to get RavenDB to store only a snapshot of the item that was stored and not track further changes?

The answer is, as is often the case if you run into hardship with RavenDB, you are doing something wrong. In this particular case, that wrongness is the fact that you are trying to do validation manually. This means that you always have to remember to call it, and that you can’t use a lot of the good stuff that RavenDB gives you, like change tracking. Instead, RavenDB contains the hooks to do it once, and do it well.

   1: public class ValidationListener : IDocumentStoreListener
   2: {
   3:     readonly Dictionary<Type, List<Action<object>>> validations = new Dictionary<Type, List<Action<object>>>();
   4:  
   5:     public void Register<T>(Action<T> validate)
   6:     {
   7:         List<Action<object>> list;
   8:         if(validations.TryGetValue(typeof(T),out list) == false)
   9:             validations[typeof (T)] = list = new List<Action<object>>();
  10:  
  11:         list.Add(o => validate((T) o));
  12:     }
  13:  
  14:     public bool BeforeStore(string key, object entityInstance, RavenJObject metadata, RavenJObject original)
  15:     {
  16:         List<Action<object>> list;
  17:         if (validations.TryGetValue(entityInstance.GetType(), out list))
  18:         {
  19:             foreach (var validation in list)
  20:             {
  21:                 validation(entityInstance);
  22:             }
  23:         }
  24:         return false;
  25:     }
  26:  
  27:     public void AfterStore(string key, object entityInstance, RavenJObject metadata)
  28:     {
  29:     }
  30: }

This will be called by RavenDB whenever we save to the database. We can now write the validation / registration code like this:

   1: var validationListener = new ValidationListener();
   2: validationListener.Register<User>(user=>
   3:     {
   4:         if (string.IsNullOrWhiteSpace(user.Name))
   5:             throw new Exception("User name can not be empty");
   6:     });
   7: store.RegisterListener(validationListener);

And that is all that she wrote.

time to read 16 min | 3110 words

In the mailing list, we got asked about an issue with code that looked like this:

   1: public abstract class Parameter
   2: {
   3:     public String Name { get; set; }
   4: }
   5:  
   6: public class IntArrayParameter : Parameter
   7: {
   8:     public Int32[,] Value { get; set; }
   9: }

I fixed the bug, but that was a strange thing to do, I thought. Happily, the person asking that question was actually taking part of a RavenDB course and I could sit with him and understand the whole question.

It appears that in their system, they have a lot of things like that:

  • IntParameter
  • StringParameter
  • BoolParameter
  • LongParameter

And along with that, they also have a coordinating class:

   1: public class ListOfParams
   2: {
   3:    public List<Param> Values {get;set;}
   4: }

The question was, could they keep using the same approach using RavenDB? They were quite anxious about this, since they had a need for the capabilities of this in their software.

This is why I hate Hello World questions. I could answer just the question that was asked, and that was it. But the problem is quite different.

You might have recognized it by now, what they have here is Entity Attribute Value system. A well known anti pattern for the relational database world and one of the few ways to actually get a dynamic schema in that world.

In RavenDB, you don’t need all of those things. You can just get things done. Here is the code that we wrote to replace the above monstrosity:

   1: public class Item : DynamicObject
   2: {
   3:     private Dictionary<string, object>  vals = new Dictionary<string, object>();
   4:  
   5:     public string StaticlyDefinedProp { get; set; }
   6:  
   7:     public override bool TryGetMember(GetMemberBinder binder, out object result)
   8:     {
   9:         return vals.TryGetValue(binder.Name, out result);
  10:     }
  11:  
  12:     public override bool TrySetMember(SetMemberBinder binder, object value)
  13:     {
  14:         if(binder.Name == "Id")
  15:             return false;
  16:         vals[binder.Name] = value;
  17:         return true;
  18:     }
  19:  
  20:     public override bool TrySetIndex(SetIndexBinder binder, object[] indexes, object value)
  21:     {
  22:         var key = (string) indexes[0];
  23:         if(key == "Id")
  24:             return false;
  25:         vals[key] = value;
  26:         return true;
  27:     }
  28:  
  29:     public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
  30:     {
  31:         return vals.TryGetValue((string) indexes[0], out result);
  32:     }
  33:  
  34:     public override IEnumerable<string> GetDynamicMemberNames()
  35:     {
  36:         return GetType().GetProperties().Select(x => x.Name).Concat(vals.Keys);
  37:     }
  38: }

Not only will this class handle the dynamics quite well, it also serializes  to idiomatic JSON, which means that querying that is about as easy as you can ask.

The EAV schema was created because RDBMS aren’t suitable for dynamic work, and like many other things from the RDMBS world, this problem just doesn’t exists for us in RavenDB.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}