Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,481 | Comments: 47,780

filter by tags archive

RavenDB 4.0 nightly builds are now available

time to read 2 min | 245 words

imageWith the RC release out of the way, we are starting on a much faster cadence of fixes and user visible changes as we get ready to the release.

In order to allow users to be able to report issues and have then resolved as soon as possible we now publish our nightly build process.

The nightly release is literally just whatever we have at the top of the branch at the time of the release. A nightly release goes through the following release cycle:

  • It compiles
  • Release it!

In other words, a nightly should be used only on development environment where you are fine with the database deciding that names must be “Green Jane” and it is fine to burp all over your data or investigate how hot we can make your CPU.

More seriously, nightlies are a way to keep up with what we are doing, and its stability is directly related to what we are currently doing. As we come closer to the release, the nightly builds stability is going to improve, but there are no safeguards there.

It means that the typical turnaround for most issues can be as low as 24 hours (and it give me back the ability, “thanks for the bug report, fixed and will be available tonight”). All other release remains with the same level of testing and preparedness.

RavenDB 4.0Managing encrypted databases

time to read 6 min | 1012 words

imageOn the right you can see how the new database creation dialog looks like, when you want to create a new encrypted database. I talked about the actual implementation of full database encryption before, but todays post’s focus is different.

I want to talk a out managing encrypted databases. As an admin tasked working with encrypted data, I need to not only manage the data in the database itself, but I also need to handle a lot more failure points when using encryption. The most obvious of them is that if you have an encrypted database in the first place, then the data you are protecting is very likely to be sensitive in nature.

That raise the immediate question of who can see that information. For that matter, are you allowed to see that information? RavenDB 4.0 has support for time limited credentials, so you register to get credentials in the system, and using whatever workflow you have the system generate a temporary API key for you that will be valid for a short time and then expire.

What about all the other things that an admin needs to do? The most obvious example is how do you handle backups, either routine or emergency ones. It is pretty obvious that if the database is encrypted, we also want the backups to be encrypted, but are they going to use the same key? How do you restore? What about moving the database from one machine to the other?

In the end, it all hangs on the notion of keys. When you create a new encrypted database, we’ll generate a key for you, and require that you confirm for us that you have persisted that information in some manner. You can print it, download it, etc. And you can see the key right there in plain text during the db creation. However, this is the last time that the database key will ever reside in plain text.

So what about this QR code, what is it doing there? Put simply, it is there to capture attention. It replicates the same information that you have in the key itself, obviously. But what for?

The idea is that users are often hurrying through the process, (the “Yes, dear!” mode) and we want to encourage them to stop of a second and think. The use of the QR code make it also much more likely that the admin will print and save the key in an offline manner, which is likely to be safer than most methods.

So this is how we encourage administrators to safely remember the encryption key. This is useful because that give the admin the ability to take a snapshot on one machine, and then recover it on another, where the encryption key is not available, or just move the hard disk between machines if the old one failed. It is actually quite common in cloud scenarios to have a machine that has an attached cloud storage, and if the machine fails, you just spin up a new machine and attach the storage to the new one.

We keep the encryption keys secret by utilizing system specific keys (either DPAPI or decryption key that only the specific user can access), so moving machines like that will require the admin to provide the encryption key so we can continue working.

The issue of backups is different. It is very common to have to store long term backups, and needing to restore them in a separate location / situation. At that point, we need the backup to be encrypted, but we don’t want it it use the same encryption key as the database itself. This is mostly about managing keys. If I’m managing multiple databases, I don’t want to record the encryption key for each as part of the restore process. That is opening us to a missed key and a useless backup that we can do nothing about.

Instead, when you setup backups (for encrypted databases it is mandatory, for regular databases, optional) we’ll give you the option to provide a public key that we’ll then use to encrypted the backup. That means that you can more safely store it in cloud scenarios, and regardless of how many databases you have, as long as you have the private key, you’ll be able to restore the backup.

Finally, we have one last topic to cover with regards to encryption, the overall system configuration. Each database can be encrypted, sure, but the management of the database (such as connection strings that it replicates to, API keys that it uses to store backups and a lot of other potentially sensitive information) is still stored in plain text. For that matter, even if the database shouldn’t be encrypted, you might still want to encrypted the full system configuration. That lead to somewhat of a chicken and egg problem.

On the one hand, we can’t encrypt the server configuration from the get go, because then the admin will not know what the key is, and they might need that if they need to move machines, etc. But once we started, we are using the server configuration, so we can’t just encrypt that on the fly. What we ended up using is a set of command line parameters, so if the admins wants to run encrypted server configuration, they can stop the server, run a command to encrypt the server configuration and setup the appropriate encryption key retrieval process (DPAPI, for example, under the right user).

That gives us the chance to make the user aware of the key and allow to save it in a secure location. All members in a cluster with an encrypted server configuration must also have encrypted server configuration, which prevents accidental leaks.

I think that this is about it, with regards to the operations details of managing encryption, Smile. Pretty sure that I missed something, but this post is getting long as it is.

Elemar Junior is joining our Latin America RavenDB team

time to read 2 min | 321 words

elemarjrToday is a great plus one news day to RavenDB Latin America. Elemar Junior is joining our team as official RavenDB consultant, on January 1st.

Elemar will work helping our customers, writing a lot of code, producing demos, videos and tutorials and in general focus on the getting started process easier as well as faster and smoother process for developers and operations people to get familiar and accustomed to getting the best out of RavenDB.

Elemar is well known for writing and speaking about advanced topics on development, design and software architecture. If you aren’t familiar with him, feel free to check his blog (pt-br only), or linkedin but the short gist of it is that he started to write computer code when he was nine years old and still love it. He has over seventeen years of professional experience developing software (used in over thirty countries) for manufacturing furniture and to design and planning of residential and commercial spaces.

With a strong focus on the Microsoft ecosystem, he has been awarded as Microsoft MVP since 2011. Elemar is the author of FluentIL, an emitting library for .NET platform, and leading figure of CodeCracker, a popular analyzer library for C# and VB that uses Roslyn to produce refactoring, code analysis, and other niceties.

Elemar can be reached via elemarjr@ravendb.net and is currently looking for someone that would Photoshop this image with a cat or ask him tough questions about RavenDB.

On a more serious note, this represent a bigger focus on having dedicated people to handle just working with customers, providing guidance and support and writing tutorials, sample applications and in general just making everything that much easier.

Code quality gateways

time to read 2 min | 319 words

I just merged a Pull Request from one of our guys. This is a pretty important piece of code, so it went through two rounds of code reviews before it actually hit the PR stage.

That was the point where the tests run (our test suite takes over an hour to run, so we run a limited test frequently, and leave the rest for later), and we got three failing tests. The funny thing was, it wasn’t a functional test that failed, it was the code quality gateways  tests.

The RavenDB team has grown quite a lot, and we are hiring again, and it is easy to lose knowledge of the “unimportant” things. Stuff that doesn’t bite you in the ass right now, for example. But will most assuredly will bite you in the ass (hard) at a later point in time.

In this case, an exception class was introduced, but it didn’t have the proper serialization. Now, we don’t actually use AppDomains all that often (at all, really), but our users sometimes do, and getting a “your exception could not be serialized” makes for an annoying debug session. Another couple of tests related to missing asserts on new configuration values. We want to make sure that we have a new configuration value is actually connected, parsed and working. And we actually have a test that would fail if you aren’t checking all the configuration properties.

There are quite a few such tests, from making sure that we don’t have void async methods to ensuring that the tests don’t leak connections or resources (which could harm other tests and complicate root cause analysis).

We also started to make use of code analyzers, for things like keeping all complex log statements in conditionals (to save allocations) to validating all async calls are using ConfigureAwait(false).

Those aren’t big things, but getting them right, all the time, give us a really cool product, and a maintainable codebase.

Looking at the bottom line

time to read 3 min | 457 words

In 2009, I decided to make a pretty big move. I decided to take RavenDB and turn that from a side project into a real product. That was something that I actually had a lot of trouble with. Unlike the profiler suite, which is a developer tool, and has a relatively short time to purchase, building a database was something that I knew was going to be a lot more complex in terms of just getting sales.

Unlike a developer tool, which is a pretty low risk investment, a database is something that is pretty significant, and that means that it would take time to settle into the market, and even if a user starts developing with RavenDB, it is usually 3 – 6 months minimum just to get to the part where they order the license. Add that to the cost of brining a new product to market, and…

Anyway, it wasn’t an easy decision. Today I was looking at some reports when I noticed something interesting. The following is the breakdown of our product based revenue since the first sale of NHibernate Profiler. Note that there is no doubt that NH Prof is a really good product for us. But it is actually pretty awesome that RavenDB is at second place.

image

This is especially significant in that the profilers has several years of lead time in the market over RavenDB. In fact, running the numbers, until 2011, we sold precious few licenses of RavenDB. In fact, here are the sales numbers for the past few years:

image

Obviously, the numbers for 2013 are still not complete, but we have already more than surpassed 2012, and we still have a full quarter to go.

For that matter, looking at the number just for 2013, we see:

image

So NH Prof is still a very major product, but RavenDB is now our top performing product for 2013, which makes me a whole lot better.

Of course, it also means that we probably need to get rid of a few other products, in particular, LLBLGen, Linq to Sql and Hibernate profilers don’t look like they are worth the trouble to keep them going. But that is a matter for another time.

Hibernating Rhinos PracticesA Sample Project

time to read 2 min | 320 words

I have previously stated that one of the things that I am looking for in a candidate is the actual candidate code. Now, I won’t accept “this is a project that I did for a client / employee”, and while it is nice to be pointed at a URL from the last project the candidate took part of, it is not a really good way to evaluate someone’s abilities.

Ideally, I would like to have someone that has an OSS portfolio that we can look at, but that isn’t always relevant. Instead, I decided to sent potential candidates the following:

Hi,

I would like to give you a small project, and see how you handle that.

The task at hand is to build a website for Webinars questions. We run bi-weekly webinars for our users, and we want to do the following:

  • Show the users a list of our webinars (The data is here: http://www.youtube.com/user/hibernatingrhinos)
  • Show a list of the next few scheduled webinar (in the user’s own time zone)
  • Allow the users to submit questions, comment on questions and vote on questions for the next webinar.
  • Allow the admin to mark specific questions as answered in a specific webinar (after it was uploaded to YouTube).
  • Manage Spam for questions & comments.

The project should be written in C#, beyond that, feel free to use whatever technologies that you are most comfortable with.

Things that we will be looking at:

  • Code quality
  • Architecture
  • Ease of modification
  • Efficiency of implementation
  • Ease of setup & deployment

Please send us the link to a Git repository containing the project, as well as any instructions that might be necessary.

Thanks in advance,

     Oren Eini

This post will go live about two weeks after I started sending this to candidates, so I am not sure yet what the response would be.

Hibernating Rhinos PracticesDesign

time to read 2 min | 286 words

One of the things that I routinely get asked is how we design things. And the answer is that we usually do not. Most things does not require complex design. The requirements we set pretty much dictate how things are going to work. Sometimes, users make suggestions that turn into a light bulb moment, and things shift very rapidly.

But sometimes, usually with the big things, we actually do need to do some design upfront. This is usually true in complex / user facing part of our projects. The Map/Reduce system, for example, was mostly re-written  in RavenDB 2.0, and that only happened after multiple design sessions internally, a full stand alone spike implementation and a lot of coffee, curses and sweat.

In many cases, when we can, we will post a suggested design on the mailing list and ask for feedback. Here is an example of such a scenario:

In this case, we didn’t get to this feature in time for the 2.0 release, but we kept thinking and refining the approach for that.

The interesting things that in those cases, we usually “design” things by doing the high level user visible API and then just let it percolate. There are a lot of additional things that we would need to change to make this work (backward compatibility being a major one), so there is a lot of additional work to be done, but that can be done during the work. Right now we can let it sit, get users’ feedback on the proposed design and get the current minor release out of the door.

Hibernating Rhinos PracticesPairing, testing and decision making

time to read 4 min | 727 words

We actually pair quite a lot, either physically (most of our stations have two keyboards & mice for that exact purpose) or remotely (Skype / Team Viewer).

2013-01-27 14.05.24 HDR

And yet, I would say that for the vast majority of cases, we don’t pair. Pairing is usually called for when we need two pairs of eyes to look at a problem, for non trivial debugging and that is about it.

Testing is something that I deeply believe in, at the same time that I distrust unit testing. Most of our tests are actually system tests. That test the system end to end. Here is an example of such a test:

[Fact]
public void CanProjectAndSort()
{
    using(var store = NewDocumentStore())
    {
        using(var session = store.OpenSession())
        {
            session.Store(new Account
            {
                Profile = new Profile
                {
                    FavoriteColor = "Red",
                    Name = "Yo"
                }
            });
            session.SaveChanges();
        }
        using(var session = store.OpenSession())
        {
            var results = (from a in session.Query<Account>()
                           .Customize(x => x.WaitForNonStaleResults())
                           orderby a.Profile.Name
                           select new {a.Id, a.Profile.Name, a.Profile.FavoriteColor}).ToArray();


            Assert.Equal("Red", results[0].FavoriteColor);
        }
    }
}

Most of our new features are usually built first, then get tests for them. Mostly because it is more efficient to get things done by experimenting a lot without having tests to tie you down.

Decision making is something that I am trying to work on. For the most part, I have things that I feel very strongly about. Production worthiness is one such scenario, and I get annoyed if something is obviously stupid, but a lot of the time decisions can fall into the either or category, or are truly preferences issues. I still think that too much goes through me, including things that probably should not.  I am trying to encourage things so I wouldn’t be in the loop so much. We are making progress, but we aren’t there yet.

Note that this post is mostly here to serve as a point of discussion. I am not really sure what to put in here, the practices we do are pretty natural, from my point of view. And I would appreciate any comments asking for clarifications.

Hibernating Rhinos PracticesWe are hiring again

time to read 1 min | 172 words

As part of this series, I wanted to take the time and let you know that we are hiring full time developers again.

This is applicable solely for developers in Israel.

We are working with C# (although I’ll admit that sometime we make it scream a little bit:

image

Candidate should be able to provide a project (and preferably more than one) that we can look at to see their code. It has got to be your code. It is ain’t yours (if it is code that you wrote for an employer, or if it is a university code project) I don’t wanna see it.

We are talking about a full time employee position, working on RavenDB, Uber Profiler, RavenFS and a bunch of other stuff that I don’t want to talk about yet.

Ping me with your CV if you are interested.

Hibernating Rhinos PracticesDevelopment Workflow

time to read 2 min | 377 words

The development workflow refers to how a developer decides what to do next, how tasks are organized, assigned and worked on.

Typically, we dedicate a lot of the Israeli’s team time to doing ongoing support and maintenance tasks. So a lot of the work are things that show up on the mailing lists. We usually triage them to one of four levels:

  • Interesting stuff that is outside of core competencies, or stuff that is nice to have that we don’t have resources for. We would usually handle that by requesting a pull request, or creating a low priority issue.
  • Feature requests / ideas – usually go to the issuer tracker and wait there until assigned / there is time to do them.
  • Bugs in our products – depending on severity, usually they are fixed on the spot, sometimes they are low priority and get to the issue tracker.
  • Priority Bugs – usually get to the top of the list over anything and everything else.

It is obviously a bit more complex, because if we are working on a particular area already, we usually also take the time to cover the easy-to-do stuff from the issue tracker.

Important things:

  • We generally don’t pay attention to releases, unless we have one pending for a product (for example, upcoming stable release for RavenDB).
  • We don’t usually try to prioritize issues. Most of them are just there, and get picked up by whoever gets them first.

We following slightly different workflows for Uber Prof & RavenDB. With Uber Prof, every single push generate a client visible build, and we have auto update to make sure that most people run on the very latest.

With RavenDB, we have the unstable builds, which is what every single push translates to, and the stable builds, which have a much more involved release process.

We tend to emphasize getting things out the door over the Thirteen Steps to Properly Release Software.

An important rule of thumb, if you are still the office by 7 PM, you have better showed up at 11 or so, just because zombies are cool nowadays doesn’t mean you have to be one. I am personally exempted from the rule, though.

Next, I’ll discuss pairing, testing and decision making.

FUTURE POSTS

  1. PR Review: Code has cost, justify it - 15 hours from now
  2. PR Review: Beware the things you can’t see - 4 days from now
  3. The right thing and what the user expect to happen are completely unrelated - 5 days from now
  4. PR Review: It’s the error handling, again - 6 days from now

There are posts all the way to Oct 25, 2017

RECENT SERIES

  1. PR Review (7):
    10 Aug 2017 - Errors, errors and more errors
  2. RavenDB 4.0 (15):
    13 Oct 2017 - Interlocked distributed operations
  3. re (21):
    10 Oct 2017 - Entity Framework Core performance tuning–Part III
  4. RavenDB 4.0 Unsung Heroes (5):
    05 Oct 2017 - The design of the security error flow
  5. Writing SSL Proxy (2):
    27 Sep 2017 - Part II, delegating authentication
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats