Ayende @ Rahien

Refunds available at head office

Hibernating Rhinos Practices: Development Workflow

The development workflow refers to how a developer decides what to do next, how tasks are organized, assigned and worked on.

Typically, we dedicate a lot of the Israeli’s team time to doing ongoing support and maintenance tasks. So a lot of the work are things that show up on the mailing lists. We usually triage them to one of four levels:

  • Interesting stuff that is outside of core competencies, or stuff that is nice to have that we don’t have resources for. We would usually handle that by requesting a pull request, or creating a low priority issue.
  • Feature requests / ideas – usually go to the issuer tracker and wait there until assigned / there is time to do them.
  • Bugs in our products – depending on severity, usually they are fixed on the spot, sometimes they are low priority and get to the issue tracker.
  • Priority Bugs – usually get to the top of the list over anything and everything else.

It is obviously a bit more complex, because if we are working on a particular area already, we usually also take the time to cover the easy-to-do stuff from the issue tracker.

Important things:

  • We generally don’t pay attention to releases, unless we have one pending for a product (for example, upcoming stable release for RavenDB).
  • We don’t usually try to prioritize issues. Most of them are just there, and get picked up by whoever gets them first.

We following slightly different workflows for Uber Prof & RavenDB. With Uber Prof, every single push generate a client visible build, and we have auto update to make sure that most people run on the very latest.

With RavenDB, we have the unstable builds, which is what every single push translates to, and the stable builds, which have a much more involved release process.

We tend to emphasize getting things out the door over the Thirteen Steps to Properly Release Software.

An important rule of thumb, if you are still the office by 7 PM, you have better showed up at 11 or so, just because zombies are cool nowadays doesn’t mean you have to be one. I am personally exempted from the rule, though.

Next, I’ll discuss pairing, testing and decision making.

Hibernating Rhinos Practices: Intro

I was asked to comment a bit on our internal practices in Hibernating Rhinos. Before I can do that, I have to explain about how we are structured.

  • The development team in Israel compose the core of the company.
  • There are additional contractors that do work in Poland, the states and the UK.

We rarely make distinctions between locations for work, although obviously we have specializations. Samuel is our go to guy for “Make things pretty” and “Silverlight hairloss”, for example, Arek is the really good in pointing to the right direction when there is a problem, and so on.

We currently have the following projects in place:

  • RavenDB
  • Uber Profiler
  • RavenFS
  • License / Orders Management
  • RavenDB.Net
  • HibernatingRhinos.com
  • ayende.com

Note that this is probably a partial list. And you might have noticed that I also included internal stuff, because that is also work, and something that we do.

In general, there isn’t a lot of “you work on this, or you work on that”, although again, there are areas of specialization. Fitzchak has been doing a lot of the work on Uber Prof, and Daniel is spending a lot of time on the RavenDB Studio. That doesn’t mean that tomorrow you wouldn’t find Fitzchak hacking on RavenDB indexes or Pawel working on exporting the profiler data to excel, and so on.

Next, I’ll discuss how we deal with the development workflow.

Courses, courses! All you can learn!

So here is my near term schedule:

  • A 2 days NHibernate course in London on Feb 25 – those are intense 2 days where I am going to jump you from novice to master in NHibernate, including understanding what is going on and much more importantly, why?!
  • A 3 days RavenDB course in London on Feb 27 – in this course we are going to go over RavenDB 2.x from top to bottom, from how it works, to all the little tricks and hidden details that you wished you knew. We are going to cover architecture, design, implementation and a lot of goodies. Also, there is going to be a lot of fun, and I promise that you go back to work with a lot less load on your shoulders.
  • A 2 days condensed RavenDB course in Stockholm on Mar 4 – Like the previous one, but with a sharper focus, and the guarantee that you will go Wow and Oh! on a regular basis.

RavenDB Webinar: Thursday 28 Jan

You can register for that here: https://www2.gotomeeting.com/register/582294746

We have enough room for a 100 people Smile.  We are going to cover:

What is new in RavenDB 2.0, operational concerns, new features and things that you can do with RavenDB.
This is an open webinar, which means we will be taking questions from the audience, so come prepared!

Lots of demo, and if we are lucky, some features that are even newer than 2.0.

Yes, it will be recorded and posted to our YouTube channel.

Tags:

Published at

Originally posted at

Comments (9)

RavenDB on the Cloud

image

About 3 years ago, RavenDB launched as 1.0 release. A few weeks ago, we had our 2.0 release.

And now… well, it is with great pleasure that I can announce that we now have an additional RavenDB as a Service provider, in addition to RavenHQ. there is also Cloud Bird.

You can get hosted RavenDB (both 1.0 and 2.0), directly or via AppHarbor (if you are already using it).

This is very exciting, and I am very happy to see 3rd parties coming into the RavenDB eco system and offering a easy way for you to get up and running with basically no effort.

And just to clear things up, because I get it a lot. Both RavenHQ and CloudBird are separate companies, they aren’t subsidiaries of Hibernating Rhinos and while we obviously worked closely with both to get this offering out, they are distinct from us. This is mainly an attempt to stop getting “My DB says it is too big, what happened” emails from landing in my mailbox.

Go and try it right now, you can sign up for a free database in just a few seconds…

Tags:

Published at

Originally posted at

Comments (9)

RavenDB new feature: Highlights

Before anything else, I need to thank Sergey Shumov for this feature. This is one of the features that we got as a pull request, and we were very happy to accept it.

What are highlights? Highlights are important when you want to give the user better search UX.

For example, let us take the Google Code data set and write the following index for it:\

public class Projects_Search : AbstractIndexCreationTask<Project, Projects_Search.Result>
{
    public class Result
    {
        public string Query { get; set; }
    }

    public Projects_Search()
    {
        Map = projects =>
              from p in projects
              select new
              {
                  Query = new[]
                  {
                      p.Name,
                      p.Summary
                  }
              };
        Store(x => x.Query, FieldStorage.Yes);
        Index(x=>x.Query, FieldIndexing.Analyzed);
    }
}

And now, we are going to search it:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, project.Summary).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

The value of q is: source

Using this, we get the following results:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - BugHunter Platfrom is am open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source Bug Tracking System. The platform allows anyone to take part in the test phase of mobile software proj
  • starship-troopers-source - Starship Troopers: Source is an open source Half-Life 2 Modification.
  • static-source-analyzer - A Java static source analyzer which recursively scans folders to analyze a project's source code
  • source-osa - Open Source Admin - A Source Engine Administration Plugin

And this make sense, and it is pretty easy to work with. Except that it would be much nicer if we could go further than this, and let the user know why we selecting those results. Here is were highlights come into play. We will start with the actual output first, because it is more impressing:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source
  • volumetrie - code source - Volumetrie est un programme permettant de récupérer des informations sur un code source - Volumetrie is a p
  • acoustic-localization-robot - s the source sound and uses a lego mindstorm NXT and motors to point a laser at the source.
  • bamboo-invoice-ce - The source-controlled Community Edition of Derek Allard's open source "Bamboo Invoice" project

And here is the code to make this happen:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Customize(x=>x.Highlight("Query", 128, 1, "Results"))
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .Select(x=> new
        {
            x.Name,
            Results = (string[])null
        })
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, string.Join(" || ", project.Results)).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

For that matter, here is me playing with things, searching for: lego mindstorm

  • acoustic-localization-robot - ses a lego mindstorm NXT and motors to point a laser at the source.
  • dpm-group-3-fall-2011 - Lego Mindstorm Final Project
  • hivemind-nxt - k for Lego Mindstorm NXT Robots
  • gsi-lego - with Lego Mindstorm using LeJos
  • lego-xjoysticktutorial - l you Lego Mindstorm NXT robot with a joystick

You can play around with how it highlight the text, but as you can see, I am pretty happy with this new feature.

Tags:

Published at

Originally posted at

Comments (12)

FLOSS Moling with RavenDB

There is the FLOSS Mole data set, which provide a lot of interesting information about open source projects. As I am always interested in testing RavenDB with different data sets, I decided that this would be a great opportunity to do that, and get some additional information about how things are working as well.

The data is provided in a number of formats, but most of them aren’t really easy to access. SQL statements and raw text files that I assume to be tab separated, but I couldn’t really figure out quickly.

I decided that this would be a great example of actually migrating content from a SQL System to a RavenDB System. The first thing to do was to install MySQL, as that seems to be the easiest way to get the data out. (As a note, MySQL Workbench is really not what I would call nice.)

The data looks like this, this is the Google Code projects, and you can also see that a lot of the data is driven from the notion of a project.

image

I explored the data a bit, and I came to the conclusion that this is pretty simple stuff, overall. There are a few many to one associations, but all of them were capped (the max was 20 or so).

That meant, in turn, that we had a really simple work to do for the import process. I started by creating the actual model which we will use to save to RavenDB:

image

The rest was just a matter of reading from MySQL and writing to RavenDB. I chose to use Peta Poco for the SQL access, because it is the easiest. The following code sucks. It is written with the assumption that I know what the data sizes are, that the cost of making so many queries (roughly a 1,500,000 queries) is acceptable, etc.

using (var docStore = new DocumentStore
    {
        ConnectionStringName = "RavenDB"
    }.Initialize())
using (var db = new PetaPoco.Database("MySQL"))
using (var bulk = docStore.BulkInsert())
{
    foreach (var prj in db.Query<dynamic>("select * from gc_projects").ToList())
    {
        string name = prj.proj_name;
        bulk.Store(new Project
            {
                Name = name,
                CodeLicense = prj.code_license,
                CodeUrl = prj.code_url,
                ContentLicense = prj.content_license,
                ContentUrl = prj.content_url,
                Description = prj.project_description,
                Summary = prj.project_summary,
                Labels = db.Query<string>("select label from gc_project_labels where proj_name = @0", name)
                                .ToList(),
                Blogs = db.Query<dynamic>("select * from gc_project_blogs where proj_name = @0", name)
                            .Select(x => new Blog { Link = x.blog_link, Title = x.blog_title })
                            .ToList(),
                Groups = db.Query<dynamic>("select * from gc_project_groups where proj_name = @0", name)
                            .Select(x => new Group { Name = x.group_name, Url = x.group_url })
                            .ToList(),
                Links = db.Query<dynamic>("select * from gc_project_links where proj_name = @0", name)
                            .Select(x => new Link { Url = x.link, Title = x.link_title })
                            .ToList(),
                People = db.Query<dynamic>("select * from gc_project_people where proj_name = @0", name)
                    .Select(x => new Person
                        {
                            Name = x.person_name,
                            Role = x.role,
                            UserId = x.user_id
                        })
                    .ToList(),
            });
    }
}

But, it does the work, and it was simple to write. Using this code, I was able to insert 299,949 projects in just under 13 minutes. Most of the time went to making those 1.5 million queries to the db, by the way.

Everything is cool, and it is quite nice. On the next post, I’ll talk about why I wanted a new dataset. Don’t worry, it is going to be cool.

Tags:

Published at

Originally posted at

Comments (4)

Elections

So today we had elections, and by tonight you will have a lot of people doing a lot of electoral math.

I don’t like elections, because of an assumption problem. It isn’t linear. This is how we usually portray the choices in elections. You pick a candidate / party that fit where you are on this line.

image

In reality, this isn’t nearly as simple. Mostly because this one line assumes that there is a central idea that is important being anything else. But let us take a few examples:

  • Tax policy
  • Security policy
  • Gay marriage
  • Religion
  • Social justice
  • Climate change

Now, they don’t fit on a single line. Your position on gay marriage doesn’t impact what you want with regards to tax policy, for example. The real scenario is:

Now, usually there is some concentration of ideas, so it is typical that if you give me your idea about gay marriage, I can guess what your ideas about climate change are.

By the way, I am taking gay marriage and climate change as examples that are common in more than a single country.

But that is guessing. And in many cases, people are a lot more complex than that. We are limited to choosing a candidate, but what happens when we have someone who we support on issue X and oppose on issue Y? We have to make tradeoffs.

So you are limited to one vote, and have to choose something on this line. Yes, as a result of that you get commonalities, a lot of people that like position X also like position Y, but not always, and sometimes I find it abhorrent that someone with whom I share the position on X also have an opposed idea on Y.

Design patterns in the test of time: Mediator

The mediator pattern defines an object that encapsulates how a set of objects interact. This pattern is considered to be a behavioral pattern due to the way it can alter the program's running behavior.

More about this pattern can be found here.

Like the Façade pattern, I can absolutely see the logic of wanting to use a mediator. It is supposed to make it easier to work with a set of objects, because it hides their interactions.

In practice, almost all known cases are bad ones. In fact, in most systems that I have seen any association of the name to the actual pattern it is supposed to represent is not very associated at all.

The differences between façade and mediator are minute, and you would think the same advice would apply. However, while you can find a lot of usages of facades (or at least things people would call facades), there are very few real world examples of mediator pattern in use. And almost all of them carry the marks that say: “Just read GoF book, @w$0m3!!!”

Design patterns in the test of time: Iterator

In object-oriented programming, the iterator pattern is a design pattern in which an iterator is used to traverse a container and access the container's elements. The iterator pattern decouples algorithms from containers; in some cases, algorithms are necessarily container-specific and thus cannot be decoupled.

More about this pattern.

It is really hard to think about any other pattern that has been more successful. In particular, patterns have long been about overcoming shortcoming of the language or platform.

In this case, iterators has became part of both language and platform in most modern systems.

  • System.Collection.IEnumerable
  • java.util.Iterator
  • Python’s __iter__()

Basically, it is so good, it is everywhere.

Design patterns in the test of time: Interpreter

In computer programming, the interpreter pattern is a design pattern that specifies how to evaluate sentences in a language. The basic idea is to have a class for each symbol (terminal or nonterminal) in a specialized computer language. The syntax tree of a sentence in the language is an instance of the composite pattern and is used to evaluate (interpret) the sentence.

More about this pattern.

Along with the visitor pattern, this is still a very useful pattern, but in a very small but important context. Parsing and executing code. We see those quite often. In particular, JavaScript is probably the most common place where we see interpreters.

That said, unless you are actually dealing with executing code, there is very little reason for you to want to apply this pattern. In fact, I have seem people go for that several times for purposes that I really can’t explain.

Interpreter is for code execution. It has some interesting differences from compiling the code. For one, it is a lot easier to write, and for the most part, performance is great. This is especially true because the hard parts (the text parsing) are usually done up front and then you are just executing the AST.

From my perspective, we use Jint, a JS interpreter in RavenDB because compiling to IL and running that was complex. Any bugs there was complex to figure out, and most important from our point of view, we really needed to be able to place limits on what you could do. The number of steps that can be taken, the recursion depth, and so on. Doing so with compiled code requires you to have kernel level access, or doing things like Thread Abort.

So Interpreters are good, but watch out when you use it, if it ain’t code that you are going to run, why are you writing this in the first place?

Bug fixes & nightmares

This just happened. I am working on clearing my mail box, when I got a repro for a high CPU load from a user.

Spent an hour trying to figure out what is going on. Found that the problem is something with map/reduce under extreme load.

Couldn’t figure it out. Went to bed depressed after midnight.

Got up at 4 AM, sat down on the same problem. Solved it (root cause, that @ayende guy doing O(N Scary) stuff) in under 15 minutes.

Now I need to figure out what to do with the rest of the night.

Also, currently in Denmark (Copenhagen) and it is bloody cold out here. This is what I look like when I am cold:

image

Tags:

Published at

Originally posted at

Comments (9)

RavenDB 2.1 Features: Robust SQL Replication

We had the ability to replicate things to SQL for a long while. But I was never really happy with the implementation. It was awkward, hard to use/configure/setup and didn’t really give us a lot of options in general.

After we finished working on 2.0, I had some free time and used that to create the new SQL Replication bundle, which is much nicer.

Basically, we go from this guy:

image

To those guys:

image

image

And the fun part is that we no longer require to do this using indexing, or have to worry about connectivity issues. RavenDB will automatically handle disruptions in connectivity, and would heal rifts automatically.

But probably the best part is that this is how we define the transformation function between RavenDB and your RDBMS of choice:

image

Tags:

Published at

Originally posted at

Comments (8)

What is next for RavenDB?

imageAfter 2.0, what comes next? RavenDB 2.0 is a massive release, close to 5 thousand commits, 6+ months of work, a team of close to 70 people working on it.

It also thought me a very good lesson about cutoffs. I don’t like them. In particular, someone who wanted to run on the stable version was stuck with 6 months with no features, and only hot fixes as needed.

Instead, I would like to go to a more streamlined model of a stable release every 6 – 8 weeks. That gives us a proper time frame for doing changes, but doesn’t leave people “in the cold” in regards to stale versions.

So that is what we are planning in terms of process. But what about the actual features? Surprisingly enough, we have enough more lined up for you.

  • Robust SQL Replication (already implemented, and waiting for additional testing & goodies). To better support reporting.
  • Indexing snapshots and unlimited result set server side processing.
  • Multiple index batches and independent indexing. Prevent a slow index from slowing down all indexing.
  • Restructuring of temporary / auto indexes.
  • Auto index merging. (Even smarter behavior with regards to optimizing indexes, removing unused indexes, etc)
  • Write index output to documents. Allow to do things like multi stage reduces.
  • WinRT & Mono Touch & Mono for Android client support.
  • More comprehensive client side caching.

And those are just the things we have as high level items Smile.

Tags:

Published at

Originally posted at

Comments (19)

RavenDB 2.0 RTM

After just a bit over 6 months of work, RavenDB 2.0 stable is finally out!

image

The new build contains 4,975 commits made by 68 contributors (from the Hibernating Rhinos’ RavenDB team and externals) and contains improvements on just about every level.

We have better performance, faster response times, better operational support, a lot more features and so much more. You can read the entire list here.

And with that, I am going to go off into the sunset and go offline for a while. A Memory of Light comes out tomorrow, and I am going to be busy. Beside, we earned it Smile.

Tags:

Published at

Originally posted at

Comments (16)

Single Responsibility Principle, Object Orientation & Active Code

Jason Folkens had a comment on my previous post:

When people combine methods and data into a class in a way such that you are recommending, I wonder if they truly value the single responsibility principle. In my mind, storing both schema and behavior in the same class qualifies as a violation of the SRP. Do you disagree with me that this is a 'violation', or do you just not think the SRP is important?

I can’t disagree enough. From Wikipedia:

An object contains encapsulated data and procedures grouped together to represent an entity.

The whole point of OOP is to encapsulate both data & behavior. To assume otherwise leads us to stateless functions and isolated DTOs.

Or, in other words, procedures and structures. And I think I’ll leave that to C.

Tags:

Published at

Originally posted at

Comments (29)

Active vs. Passive code bases

I was review code at a customer site, and he had a lot of classes that looked something like this:

   1: public class ValidationData
   2: {
   3:     public string Type {get;set;}
   4:     public string Value {get;set;}
   5: }

In the database, he would have the data like this:

image

This is obviously a very simple example, but it gets the job done, I think.

In his code base, the customer had several instance of this example, for validation of certain parts of the system, for handling business rules, for checking how to handle various events, and I think you get the picture.

I seriously dislike such codebases. You take an innocent piece of code and make it so passive it… well, you can see:

image

Here is why this is bad. The code is passive, it is just a data holder. And that means that in order to process it you are going to have some other code that handles that for you. That likely means a switch statement of the equivalent. And it also means that making any sort of change now have to happen on multiple locations. Puke.

For fun, using this anti pattern all over your codebase result in you have to do this over and over again, for any new interesting thing that you are doing .It is a lot of work, and a lot of places that you have to change.

But you can be a hero and set the code free:

You do that by making a very simple change. Instead of having passive data containers that other pieces of the code need to react to, make them active.

   1: public class AvoidCurseWordsValidator : IValidator
   2: {
   3:    public string[] CurseWords {get;set;}
   4:    public void Validate(...) { }
   5: }
   6:  
   7: public class MaxLenValidator : IValidator
   8: {
   9:    public int MaxLen {get; set;}
  10:    public void Validate(...) { }
  11: }
  12:  
  13: public class InvalidCharsValidator : IValidator
  14: {
  15:    public char[] InvalidChards {get;set;}
  16:    public void Validate(...) { }
  17: }

Now, if we want to modify / add something, we can do this in only one spot. Hurray for Single Responsibility and Open Closed principles.

SO… don’t let your codebase be dominated by switch statements, parallel hierarchies and other nasties. Make it go active, and you’ll like the results.

Implementation details: RavenDB Bulk Inserts

With RavenDB Bulk Inserts, we significantly improved the time we take to insert a boat load of documents to RavenDB. By over an order of magnitude, in fact.

How did we do that? By doing a whole bunch of things, but mostly by being smart in utilizing the resources on both client & server.

Here is how a standard request timeline looks like:

image

As you can see, there are several factors that hold us up here. We need to prepare (in memory) all of the data to send. On the server side ,we wait until we have the entire dataset before we can start processing the data. Just that aspect cost us a lot.

And because there is finite amount of memory we can use, it means that we have to split things to batches, and each batch is a separate request, requiring the same song and dance just on the network layer.

That doesn’t count what we have to do on the server end once we actually have the data, which is to get the data, process it, flush it to disk, register it for indexing,  call fsync, etc. That isn’t too much, unless you are trying to shove as much data s you can, as fast as you can.

In contrast, this is how the bulk insert looks like on the network:

image

We stream the results to the server directly, so while the client is still sending results, we are already flushing them to disk.

To make things even more interesting, we aren’t using standard GZip compression over the whole request. Instead, each batch is compressed independently, which means we don’t have a dependency on the internals of the compression routine internal buffering system, etc. It also means that we get each batch much faster.

There are, of course, rate limits built in, to protect ourselves from flooding the buffers, but for the most part, you will have hard time hitting them.

In addition to that, because we are in the context of a single request, we can apply additional optimizations, we can do lazy flushes (not have to wait for a full fsync for each batch) because we do the final fsync at the end of the final request.

Finally, we actually created an optimized code path that skips doing a lot of the things that we do in the normal path. For example, by default we assume you are doing an insert only (saves checking the disk, and will throw if not true), we don’t populate the indexing prefetching queue, etc. All in all, it means that we got more than an order of magnitude improvement.

Tags:

Published at

Originally posted at

Comments (5)