Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 377 words

The development workflow refers to how a developer decides what to do next, how tasks are organized, assigned and worked on.

Typically, we dedicate a lot of the Israeli’s team time to doing ongoing support and maintenance tasks. So a lot of the work are things that show up on the mailing lists. We usually triage them to one of four levels:

  • Interesting stuff that is outside of core competencies, or stuff that is nice to have that we don’t have resources for. We would usually handle that by requesting a pull request, or creating a low priority issue.
  • Feature requests / ideas – usually go to the issuer tracker and wait there until assigned / there is time to do them.
  • Bugs in our products – depending on severity, usually they are fixed on the spot, sometimes they are low priority and get to the issue tracker.
  • Priority Bugs – usually get to the top of the list over anything and everything else.

It is obviously a bit more complex, because if we are working on a particular area already, we usually also take the time to cover the easy-to-do stuff from the issue tracker.

Important things:

  • We generally don’t pay attention to releases, unless we have one pending for a product (for example, upcoming stable release for RavenDB).
  • We don’t usually try to prioritize issues. Most of them are just there, and get picked up by whoever gets them first.

We following slightly different workflows for Uber Prof & RavenDB. With Uber Prof, every single push generate a client visible build, and we have auto update to make sure that most people run on the very latest.

With RavenDB, we have the unstable builds, which is what every single push translates to, and the stable builds, which have a much more involved release process.

We tend to emphasize getting things out the door over the Thirteen Steps to Properly Release Software.

An important rule of thumb, if you are still the office by 7 PM, you have better showed up at 11 or so, just because zombies are cool nowadays doesn’t mean you have to be one. I am personally exempted from the rule, though.

Next, I’ll discuss pairing, testing and decision making.

time to read 2 min | 236 words

I was asked to comment a bit on our internal practices in Hibernating Rhinos. Before I can do that, I have to explain about how we are structured.

  • The development team in Israel compose the core of the company.
  • There are additional contractors that do work in Poland, the states and the UK.

We rarely make distinctions between locations for work, although obviously we have specializations. Samuel is our go to guy for “Make things pretty” and “Silverlight hairloss”, for example, Arek is the really good in pointing to the right direction when there is a problem, and so on.

We currently have the following projects in place:

  • RavenDB
  • Uber Profiler
  • RavenFS
  • License / Orders Management
  • RavenDB.Net
  • HibernatingRhinos.com
  • ayende.com

Note that this is probably a partial list. And you might have noticed that I also included internal stuff, because that is also work, and something that we do.

In general, there isn’t a lot of “you work on this, or you work on that”, although again, there are areas of specialization. Fitzchak has been doing a lot of the work on Uber Prof, and Daniel is spending a lot of time on the RavenDB Studio. That doesn’t mean that tomorrow you wouldn’t find Fitzchak hacking on RavenDB indexes or Pawel working on exporting the profiler data to excel, and so on.

Next, I’ll discuss how we deal with the development workflow.

time to read 1 min | 165 words

So here is my near term schedule:

  • A 2 days NHibernate course in London on Feb 25 – those are intense 2 days where I am going to jump you from novice to master in NHibernate, including understanding what is going on and much more importantly, why?!
  • A 3 days RavenDB course in London on Feb 27 – in this course we are going to go over RavenDB 2.x from top to bottom, from how it works, to all the little tricks and hidden details that you wished you knew. We are going to cover architecture, design, implementation and a lot of goodies. Also, there is going to be a lot of fun, and I promise that you go back to work with a lot less load on your shoulders.
  • A 2 days condensed RavenDB course in Stockholm on Mar 4 – Like the previous one, but with a sharper focus, and the guarantee that you will go Wow and Oh! on a regular basis.
time to read 1 min | 100 words

You can register for that here: https://www2.gotomeeting.com/register/582294746

We have enough room for a 100 people Smile.  We are going to cover:

What is new in RavenDB 2.0, operational concerns, new features and things that you can do with RavenDB.
This is an open webinar, which means we will be taking questions from the audience, so come prepared!

Lots of demo, and if we are lucky, some features that are even newer than 2.0.

Yes, it will be recorded and posted to our YouTube channel.

time to read 2 min | 273 words

image

About 3 years ago, RavenDB launched as 1.0 release. A few weeks ago, we had our 2.0 release.

And now… well, it is with great pleasure that I can announce that we now have an additional RavenDB as a Service provider, in addition to RavenHQ. there is also Cloud Bird.

You can get hosted RavenDB (both 1.0 and 2.0), directly or via AppHarbor (if you are already using it).

This is very exciting, and I am very happy to see 3rd parties coming into the RavenDB eco system and offering a easy way for you to get up and running with basically no effort.

And just to clear things up, because I get it a lot. Both RavenHQ and CloudBird are separate companies, they aren’t subsidiaries of Hibernating Rhinos and while we obviously worked closely with both to get this offering out, they are distinct from us. This is mainly an attempt to stop getting “My DB says it is too big, what happened” emails from landing in my mailbox.

Go and try it right now, you can sign up for a free database in just a few seconds…

time to read 7 min | 1285 words

Before anything else, I need to thank Sergey Shumov for this feature. This is one of the features that we got as a pull request, and we were very happy to accept it.

What are highlights? Highlights are important when you want to give the user better search UX.

For example, let us take the Google Code data set and write the following index for it:\

public class Projects_Search : AbstractIndexCreationTask<Project, Projects_Search.Result>
{
    public class Result
    {
        public string Query { get; set; }
    }

    public Projects_Search()
    {
        Map = projects =>
              from p in projects
              select new
              {
                  Query = new[]
                  {
                      p.Name,
                      p.Summary
                  }
              };
        Store(x => x.Query, FieldStorage.Yes);
        Index(x=>x.Query, FieldIndexing.Analyzed);
    }
}

And now, we are going to search it:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, project.Summary).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

The value of q is: source

Using this, we get the following results:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - BugHunter Platfrom is am open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source Bug Tracking System. The platform allows anyone to take part in the test phase of mobile software proj
  • starship-troopers-source - Starship Troopers: Source is an open source Half-Life 2 Modification.
  • static-source-analyzer - A Java static source analyzer which recursively scans folders to analyze a project's source code
  • source-osa - Open Source Admin - A Source Engine Administration Plugin

And this make sense, and it is pretty easy to work with. Except that it would be much nicer if we could go further than this, and let the user know why we selecting those results. Here is were highlights come into play. We will start with the actual output first, because it is more impressing:

  • hl2sb-src - Source code to Half-Life 2: Sandbox - A free and open-source sandbox Source engine modification.
  • mobilebughunter - open source platform that integrates with BugHunter Platform is am open source platform that integrates with Mantis Open Source
  • volumetrie - code source - Volumetrie est un programme permettant de récupérer des informations sur un code source - Volumetrie is a p
  • acoustic-localization-robot - s the source sound and uses a lego mindstorm NXT and motors to point a laser at the source.
  • bamboo-invoice-ce - The source-controlled Community Edition of Derek Allard's open source "Bamboo Invoice" project

And here is the code to make this happen:

using(var session = store.OpenSession())
{
    var prjs = session.Query<Projects_Search.Result, Projects_Search>()
        .Customize(x=>x.Highlight("Query", 128, 1, "Results"))
        .Search(x => x.Query, q)
        .Take(5)
        .OfType<Project>()
        .Select(x=> new
        {
            x.Name,
            Results = (string[])null
        })
        .ToList();

    var sb = new StringBuilder().AppendLine("<ul>");

    foreach (var project in prjs)
    {
        sb.AppendFormat("<li>{0} - {1}</li>", project.Name, string.Join(" || ", project.Results)).AppendLine();
    }
    var s = sb
        .AppendLine("</ul>")
        .ToString();
}

For that matter, here is me playing with things, searching for: lego mindstorm

  • acoustic-localization-robot - ses a lego mindstorm NXT and motors to point a laser at the source.
  • dpm-group-3-fall-2011 - Lego Mindstorm Final Project
  • hivemind-nxt - k for Lego Mindstorm NXT Robots
  • gsi-lego - with Lego Mindstorm using LeJos
  • lego-xjoysticktutorial - l you Lego Mindstorm NXT robot with a joystick

You can play around with how it highlight the text, but as you can see, I am pretty happy with this new feature.

time to read 7 min | 1292 words

There is the FLOSS Mole data set, which provide a lot of interesting information about open source projects. As I am always interested in testing RavenDB with different data sets, I decided that this would be a great opportunity to do that, and get some additional information about how things are working as well.

The data is provided in a number of formats, but most of them aren’t really easy to access. SQL statements and raw text files that I assume to be tab separated, but I couldn’t really figure out quickly.

I decided that this would be a great example of actually migrating content from a SQL System to a RavenDB System. The first thing to do was to install MySQL, as that seems to be the easiest way to get the data out. (As a note, MySQL Workbench is really not what I would call nice.)

The data looks like this, this is the Google Code projects, and you can also see that a lot of the data is driven from the notion of a project.

image

I explored the data a bit, and I came to the conclusion that this is pretty simple stuff, overall. There are a few many to one associations, but all of them were capped (the max was 20 or so).

That meant, in turn, that we had a really simple work to do for the import process. I started by creating the actual model which we will use to save to RavenDB:

image

The rest was just a matter of reading from MySQL and writing to RavenDB. I chose to use Peta Poco for the SQL access, because it is the easiest. The following code sucks. It is written with the assumption that I know what the data sizes are, that the cost of making so many queries (roughly a 1,500,000 queries) is acceptable, etc.

using (var docStore = new DocumentStore
    {
        ConnectionStringName = "RavenDB"
    }.Initialize())
using (var db = new PetaPoco.Database("MySQL"))
using (var bulk = docStore.BulkInsert())
{
    foreach (var prj in db.Query<dynamic>("select * from gc_projects").ToList())
    {
        string name = prj.proj_name;
        bulk.Store(new Project
            {
                Name = name,
                CodeLicense = prj.code_license,
                CodeUrl = prj.code_url,
                ContentLicense = prj.content_license,
                ContentUrl = prj.content_url,
                Description = prj.project_description,
                Summary = prj.project_summary,
                Labels = db.Query<string>("select label from gc_project_labels where proj_name = @0", name)
                                .ToList(),
                Blogs = db.Query<dynamic>("select * from gc_project_blogs where proj_name = @0", name)
                            .Select(x => new Blog { Link = x.blog_link, Title = x.blog_title })
                            .ToList(),
                Groups = db.Query<dynamic>("select * from gc_project_groups where proj_name = @0", name)
                            .Select(x => new Group { Name = x.group_name, Url = x.group_url })
                            .ToList(),
                Links = db.Query<dynamic>("select * from gc_project_links where proj_name = @0", name)
                            .Select(x => new Link { Url = x.link, Title = x.link_title })
                            .ToList(),
                People = db.Query<dynamic>("select * from gc_project_people where proj_name = @0", name)
                    .Select(x => new Person
                        {
                            Name = x.person_name,
                            Role = x.role,
                            UserId = x.user_id
                        })
                    .ToList(),
            });
    }
}

But, it does the work, and it was simple to write. Using this code, I was able to insert 299,949 projects in just under 13 minutes. Most of the time went to making those 1.5 million queries to the db, by the way.

Everything is cool, and it is quite nice. On the next post, I’ll talk about why I wanted a new dataset. Don’t worry, it is going to be cool.

Elections

time to read 2 min | 325 words

So today we had elections, and by tonight you will have a lot of people doing a lot of electoral math.

I don’t like elections, because of an assumption problem. It isn’t linear. This is how we usually portray the choices in elections. You pick a candidate / party that fit where you are on this line.

image

In reality, this isn’t nearly as simple. Mostly because this one line assumes that there is a central idea that is important being anything else. But let us take a few examples:

  • Tax policy
  • Security policy
  • Gay marriage
  • Religion
  • Social justice
  • Climate change

Now, they don’t fit on a single line. Your position on gay marriage doesn’t impact what you want with regards to tax policy, for example. The real scenario is:

Now, usually there is some concentration of ideas, so it is typical that if you give me your idea about gay marriage, I can guess what your ideas about climate change are.

By the way, I am taking gay marriage and climate change as examples that are common in more than a single country.

But that is guessing. And in many cases, people are a lot more complex than that. We are limited to choosing a candidate, but what happens when we have someone who we support on issue X and oppose on issue Y? We have to make tradeoffs.

So you are limited to one vote, and have to choose something on this line. Yes, as a result of that you get commonalities, a lot of people that like position X also like position Y, but not always, and sometimes I find it abhorrent that someone with whom I share the position on X also have an opposed idea on Y.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}