Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,565
|
Comments: 51,185
Privacy Policy · Terms
filter by tags archive
time to read 40 min | 7916 words

Yes, it is somewhat of a blast from the past, but I just got asked how to create a good Union All operation for Rhino ETL.

The obvious implementation is:

   1: public class UnionAllOperation : AbstractOperation
   2: {
   3:     private readonly List<IOperation> _operations = new List<IOperation>(); 
   4:  
   5:     public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
   6:     {
   7:         foreach (var operation in _operations)
   8:             foreach (var row in operation.Execute(null))
   9:                 yield return row;
  10:     }
  11:  
  12:     public UnionAllOperation Add(IOperation operation)
  13:     {
  14:         _operations.Add(operation);
  15:         return this;
  16:     }
  17: }

The problem is that this does everything synchronously. The following code is a better impl, but note that this is notepad code, with all the implications of that.

   1: public class UnionAllOperation : AbstractOperation
   2: {
   3:     private readonly List<IOperation> _operations = new List<IOperation>(); 
   4:  
   5:     public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
   6:     {
   7:         var blockingCollection = new BlockingCollection<Row>();
   8:         var tasks = _operations.Select(currentOp => Task.Factory.StartNew(() =>{
   9:                 foreach(var operation in currentOp.Execute(null))
  10:                 {
  11:                     blockingCollection.Add(operation);
  12:                 }
  13:                 blockingCollection.Add(null); // free the consumer thread
  14:             });
  15:  
  16:         Row r;
  17:         while(true){
  18:             if(tasks.All(x=>x.IsFaulted || x.IsCanceled || x.IsCompleted)) // all done
  19:                 break;
  20:             r = blockingCollection.Take();
  21:             if(r == null)
  22:                 continue;
  23:             yield return r;
  24:         }
  25:         while(blockingCollection.TryTake(out r)) {
  26:             if(r == null)
  27:                 continue;
  28:             yield return r;
  29:         }
  30:         Task.WaitAll(tasks.ToArray()); // raise any exception that were raised during execption
  31:     }
  32:  
  33:     public UnionAllOperation Add(IOperation operation)
  34:     {
  35:         _operations.Add(operation);
  36:         return this;
  37:     }
  38: }

Usual caveats apply, notepad code, never actually run it, much less tested / debugged it.

Feel free to rip into it, though.

Dale Newman did some improvements, the most important one is to make sure that we aren’t going to evaluate the tasks several times (opps! I told ya it was notepad code Smile), and now it looks like this:

   1: /// <summary>
   2: /// Combines rows from all operations.
   3: /// </summary>
   4: public class UnionAllOperation : AbstractOperation {
   5:  
   6:     private readonly List<IOperation> _operations = new List<IOperation>();
   7:  
   8:     /// <summary>
   9:     /// Executes the added operations in parallel.
  10:     /// </summary>
  11:     /// <param name="rows"></param>
  12:     /// <returns></returns>
  13:     public override IEnumerable<Row> Execute(IEnumerable<Row> rows) {
  14:  
  15:         var blockingCollection = new BlockingCollection<Row>();
  16:  
  17:         Debug("Creating tasks for {0} operations.", _operations.Count);
  18:  
  19:         var tasks = _operations.Select(currentOp => Task.Factory.StartNew(() => {
  20:             Trace("Executing {0} operation.", currentOp.Name);
  21:             foreach (var row in currentOp.Execute(null)) {
  22:                 blockingCollection.Add(row);
  23:             }
  24:             blockingCollection.Add(null); // free the consumer thread
  25:         })).ToArray();
  26:  
  27:         Row r;
  28:         while (true) {
  29:             if (tasks.All(x => x.IsFaulted || x.IsCanceled || x.IsCompleted)) {
  30:                 Debug("All tasks have been canceled, have faulted, or have completed.");
  31:                 break;
  32:             }
  33:  
  34:             r = blockingCollection.Take();
  35:             if (r == null)
  36:                 continue;
  37:  
  38:             yield return r;
  39:  
  40:         }
  41:  
  42:         while (blockingCollection.TryTake(out r)) {
  43:             if (r == null)
  44:                 continue;
  45:             yield return r;
  46:         }
  47:  
  48:         Task.WaitAll(tasks); // raise any exception that were raised during execption
  49:  
  50:     }
  51:  
  52:     /// <summary>
  53:     /// Initializes this instance
  54:     /// </summary>
  55:     /// <param name="pipelineExecuter">The current pipeline executer.</param>
  56:     public override void PrepareForExecution(IPipelineExecuter pipelineExecuter) {
  57:         foreach (var operation in _operations) {
  58:             operation.PrepareForExecution(pipelineExecuter);
  59:         }
  60:     }
  61:  
  62:     /// <summary>
  63:     /// Add operation parameters
  64:     /// </summary>
  65:     /// <param name="ops">operations delimited by commas</param>
  66:     /// <returns></returns>
  67:     public UnionAllOperation Add(params IOperation[] ops) {
  68:         foreach (var operation in ops) {
  69:             _operations.Add(operation);
  70:         }
  71:         return this;
  72:     }
  73:  
  74:     /// <summary>
  75:     /// Add operations
  76:     /// </summary>
  77:     /// <param name="ops">an enumerable of operations</param>
  78:     /// <returns></returns>
  79:     public UnionAllOperation Add(IEnumerable<IOperation> ops) {
  80:         _operations.AddRange(ops);
  81:         return this;
  82:     }
  83:  
  84: }
time to read 1 min | 177 words

We just had a failing test:

image

As you can see we assumed that fiddler is running, when it isn’t. Here is the bug:

image

Now, this is great when I am testing things out, and want to check what is going on the wire using Fiddler, but I always have to remember to revert this change, otherwise we will have a failing test and a failing build.

That isn’t very friction free, so I added the following:

image

Now the code is smart enough to not fail the test if we didn’t do things right.

time to read 5 min | 834 words

As you probably know, I get called quite a lot to customers to “assist” in failing or problematic software projects. Maybe the performance isn’t nearly what it should be, maybe it is so very hard to make changes, maybe it is… one of the thousand and one things that can go wrong, and usually does.

Internally, I divide those projects into two broad categories: The stupid and the nail guns.

I rarely get called to projects that fall under the stupid category. When it happens, it is usually because someone new came in, looked at the codebase and called for help. I love working with stupid code bases. They are easy to understand, if hard to work with, and it is pretty obvious what is wrong. And the team is usually very receptive about getting advice on how to fix it.

But I usually am called for nail gun projects, and those are so much more complex…

But before I can talk about them, I need to explain first what I meant when I say “nail gun projects”. Consider an interesting fact. Absolutely no one will publish an article saying “we did nothing special, we had nothing out of the ordinary, and we shipped roughly on time, roughly on budget and with about the expected feature set. The client was reasonably happy.” And even if someone would post that, no one would read it.

Think about your life, as an example. You wake up, walk the dogs, take kids to school, go to work, come back from work, fall asleep reading this sentence, watch some TV, eat along the way, sleep. Rinse, repeat.

Now, let us go and look at the paper. At the time of this writing, those were the top stories at CNN:

Hopefully, there is a big disconnect between your life and those sort of news.

Now, let us think about the sort of posts, articles and books that you have been reading. You won’t find any book called: "Delivering OK projects”

And most of the literature about software projects is on one of two ends: We did something incredibly hard, and we did it well or we did something (obvious, usually) and we failed really badly. People who read those books tend to look at those books (either kind) and almost blindly adopt the suggested practices. Usually without looking at that section called “When it is appropriate to do what we do”.

Probably the best example is the waterfall methodology, originated in the 1970  paper "Managing the Development of Large Software Systems" from Winston W. Royce.

From the paper:

…the implementation described above is risky and invites failure

As you can imagine, no one actually listened, and the rest is history.

How about those nail guns again?

Well, imagine that you are a contractor, and here are you tools of the trade:

They are good tools, and they served you well for a while. But now you are reading about “Nail guns usage for better, faster and more effective framing or roofing". In the study, you read how there was a need to nail 3,000 shingles and using a nail gun the team was successfully able to complete the task with higher efficiency over the use of the standard hammer.

Being a conscientious professional, you head the advice and immediately buy the best nail gun you can find:

(This is just a random nail gun picture, I don’t know what brand, nor really care.)

And indeed, a nail gun is a great tool when you need to nail a lot of things very fast. But it is a highly effective tool that is extremely limited in what it can do.

But you know that a nail gun is 333% more efficient than the hammer, so you throw it away. And then you get a request: Can you hang this picture on the wall, please?

It would be easy with a hammer, but with a nail gun:

It isn’t the stupid / lazy / ignorant people that go for the nail gun solutions.

It is the really hard working people, the guys who really try to make things better. Of course, what usually happen is this:

 

And here we get back to the projects that I usually get called for. Those are projects that were created by really smart people, with the best of intentions, and with the clear understanding that they want to get quality stuff done.

The problem is that they are using Nail Guns for the architecture. For example, let us just look at this post. And the end is already written.

time to read 2 min | 340 words

Just about a month after the 2.0 release ,we have a minor stable release for RavenDB containing a lot of bug fixes, a few minor features and some new cool stuff.

Before I move on to anything else, everyone who is using 2.0 should be upgrading to 2.01. There have been a number of issues that were fixed between 2.0 and 2.01, and some of them are pretty important.

You can see the full list here, the highlights are below:

http://hibernatingrhinos.com/builds/ravendb-stable/2260

Bug Fixes:

  • Fixed race condition with replication when doing rapid update to the same document / attachment.
  • Fixed issues of using the RavenDB Profiler with a different version of jQuery.
  • Fixed bug where disposing of one changes subscription would also dispose others.
  • Replication doesn't work when using API key.
  • HTTP Spec Comp: Support quoted etags in headers.
  • Fixed a problem with map/reduce indexes moving between single step and multi step reduce would get duplicate data.
  • Fixed an error when using encryption and reducing the document size.
  • Support facets on spatial queries
  • Fixed unbounded load when loading items to reduce in multi step reduce.
  • Fixed bulk insert on IIS with authentication.
  • Fixed Last-Modified date is not being updated in embedded mode

Features

  • Added SQL Replication support.
  • Will use multiple index batches if we have a slow index.
  • More aggressive behavior with regards to releasing memory on the client under low memory conditions.
  • Adding debug ability for authentication issues.
  • Implemented server side fulltext search results highlighting
  • Moved expensive suggestions operation init to a separate thread.
  • Allow to define suggestions as part of the index creation.
  • Better facets paging.
  • Expose better view of internal storage concerns.
  • Support TermVector options on index fields.
  • RavenDB-894 When authenticating from the studio using API Key, set a cookie This allows us to use standard browser commands after authenticating using the studio just once.
  • Periodic backup can now backup to a local path as well.
  • Adding debug information for when we filter documents for replication
  • Async API improvements:
    • AnyAsync
    • StoreAsync

On failing tests

time to read 1 min | 111 words

I made a change (deep in the guts of RavenDB), and then I run the tests, and I go this:

image

I love* it when this happens, because it means that there is one root cause that I need to fix, really obvious and in the main code path.

I hate it when there is just one failing test, because it means that this is an edge condition or something freaky like that.

* obviously I would love it more if there were no failing tests.

time to read 2 min | 286 words

One of the things that I routinely get asked is how we design things. And the answer is that we usually do not. Most things does not require complex design. The requirements we set pretty much dictate how things are going to work. Sometimes, users make suggestions that turn into a light bulb moment, and things shift very rapidly.

But sometimes, usually with the big things, we actually do need to do some design upfront. This is usually true in complex / user facing part of our projects. The Map/Reduce system, for example, was mostly re-written  in RavenDB 2.0, and that only happened after multiple design sessions internally, a full stand alone spike implementation and a lot of coffee, curses and sweat.

In many cases, when we can, we will post a suggested design on the mailing list and ask for feedback. Here is an example of such a scenario:

In this case, we didn’t get to this feature in time for the 2.0 release, but we kept thinking and refining the approach for that.

The interesting things that in those cases, we usually “design” things by doing the high level user visible API and then just let it percolate. There are a lot of additional things that we would need to change to make this work (backward compatibility being a major one), so there is a lot of additional work to be done, but that can be done during the work. Right now we can let it sit, get users’ feedback on the proposed design and get the current minor release out of the door.

time to read 4 min | 727 words

We actually pair quite a lot, either physically (most of our stations have two keyboards & mice for that exact purpose) or remotely (Skype / Team Viewer).

2013-01-27 14.05.24 HDR

And yet, I would say that for the vast majority of cases, we don’t pair. Pairing is usually called for when we need two pairs of eyes to look at a problem, for non trivial debugging and that is about it.

Testing is something that I deeply believe in, at the same time that I distrust unit testing. Most of our tests are actually system tests. That test the system end to end. Here is an example of such a test:

[Fact]
public void CanProjectAndSort()
{
    using(var store = NewDocumentStore())
    {
        using(var session = store.OpenSession())
        {
            session.Store(new Account
            {
                Profile = new Profile
                {
                    FavoriteColor = "Red",
                    Name = "Yo"
                }
            });
            session.SaveChanges();
        }
        using(var session = store.OpenSession())
        {
            var results = (from a in session.Query<Account>()
                           .Customize(x => x.WaitForNonStaleResults())
                           orderby a.Profile.Name
                           select new {a.Id, a.Profile.Name, a.Profile.FavoriteColor}).ToArray();


            Assert.Equal("Red", results[0].FavoriteColor);
        }
    }
}

Most of our new features are usually built first, then get tests for them. Mostly because it is more efficient to get things done by experimenting a lot without having tests to tie you down.

Decision making is something that I am trying to work on. For the most part, I have things that I feel very strongly about. Production worthiness is one such scenario, and I get annoyed if something is obviously stupid, but a lot of the time decisions can fall into the either or category, or are truly preferences issues. I still think that too much goes through me, including things that probably should not.  I am trying to encourage things so I wouldn’t be in the loop so much. We are making progress, but we aren’t there yet.

Note that this post is mostly here to serve as a point of discussion. I am not really sure what to put in here, the practices we do are pretty natural, from my point of view. And I would appreciate any comments asking for clarifications.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}