Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

FLOSS Moling with RavenDB

time to read 7 min | 1292 words

There is the FLOSS Mole data set, which provide a lot of interesting information about open source projects. As I am always interested in testing RavenDB with different data sets, I decided that this would be a great opportunity to do that, and get some additional information about how things are working as well.

The data is provided in a number of formats, but most of them aren’t really easy to access. SQL statements and raw text files that I assume to be tab separated, but I couldn’t really figure out quickly.

I decided that this would be a great example of actually migrating content from a SQL System to a RavenDB System. The first thing to do was to install MySQL, as that seems to be the easiest way to get the data out. (As a note, MySQL Workbench is really not what I would call nice.)

The data looks like this, this is the Google Code projects, and you can also see that a lot of the data is driven from the notion of a project.


I explored the data a bit, and I came to the conclusion that this is pretty simple stuff, overall. There are a few many to one associations, but all of them were capped (the max was 20 or so).

That meant, in turn, that we had a really simple work to do for the import process. I started by creating the actual model which we will use to save to RavenDB:


The rest was just a matter of reading from MySQL and writing to RavenDB. I chose to use Peta Poco for the SQL access, because it is the easiest. The following code sucks. It is written with the assumption that I know what the data sizes are, that the cost of making so many queries (roughly a 1,500,000 queries) is acceptable, etc.

using (var docStore = new DocumentStore
        ConnectionStringName = "RavenDB"
using (var db = new PetaPoco.Database("MySQL"))
using (var bulk = docStore.BulkInsert())
    foreach (var prj in db.Query<dynamic>("select * from gc_projects").ToList())
        string name = prj.proj_name;
        bulk.Store(new Project
                Name = name,
                CodeLicense = prj.code_license,
                CodeUrl = prj.code_url,
                ContentLicense = prj.content_license,
                ContentUrl = prj.content_url,
                Description = prj.project_description,
                Summary = prj.project_summary,
                Labels = db.Query<string>("select label from gc_project_labels where proj_name = @0", name)
                Blogs = db.Query<dynamic>("select * from gc_project_blogs where proj_name = @0", name)
                            .Select(x => new Blog { Link = x.blog_link, Title = x.blog_title })
                Groups = db.Query<dynamic>("select * from gc_project_groups where proj_name = @0", name)
                            .Select(x => new Group { Name = x.group_name, Url = x.group_url })
                Links = db.Query<dynamic>("select * from gc_project_links where proj_name = @0", name)
                            .Select(x => new Link { Url = x.link, Title = x.link_title })
                People = db.Query<dynamic>("select * from gc_project_people where proj_name = @0", name)
                    .Select(x => new Person
                            Name = x.person_name,
                            Role = x.role,
                            UserId = x.user_id

But, it does the work, and it was simple to write. Using this code, I was able to insert 299,949 projects in just under 13 minutes. Most of the time went to making those 1.5 million queries to the db, by the way.

Everything is cool, and it is quite nice. On the next post, I’ll talk about why I wanted a new dataset. Don’t worry, it is going to be cool.



Agreed, the workbench app is bad. I only use the "legacy" admin and query apps.


You mentioned the many to one relationships were capped at 20. Out of curiosity, does raven support fixed length arrays on models?

Ayende Rahien

Flukus, Yes, it does. Although the capping wasn't physical, it was just that the biggest collection they had was 20 items in size.


it would probably run faster if all the queries were under the same transaction: using (var scope = db.GetTransaction())

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - about one day from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats