Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,840 | Comments: 49,138

filter by tags archive
time to read 3 min | 483 words

imageOver the weekend, I learned that Joe Armstrong has passed away. I have been thinking about through all of yesterday, because I have met Joe and had a few discussions with him, but I never had the chance to actually know him. Which is a shame, in a way, he changed my life.

One of the advantages of having a blog is that I can go back in time and trace things. In Sep 2007, I run into Joe for the first time. It was in the JAOO conference in Aarhus. I sat in his talk and was quite impressed. This is what I had to say at the time:

I was at the Erlang talk, which is quite probably the best one that will be here. Joe has created the language and wrote the book about it, so he certainly knows his stuff, and he is a Character with a capital C. I am not sure if it is a show or not, but it was amazingly amusing.

Bought the Erlang book, it is a weird language compare to those I know, but I really need to learn a new language this year, and Erlang gets me both functional and concurrent aspects for the "price" of one.

A couple of years later I was at the same conference and wrote:

I remember sitting at a session with Joe Armstrong talking about Erlang and finally getting things that were annoying just beyond my grasp.

Even since, whenever we were in the same conferences, I made sure to sit in his talk. He was an amazing speaker and I still carry with me his advice on system design and distributed architecture. I never really liked the Erlang syntax, but the concepts were very attractive to me. It took a while for this to percolate, but after reading some more about Erlang, I looked for an OSS project in Erlang that I could read, to actually grok what it is like to write in Erlang. I chose to read the CouchDB source code.

This was the first time that I really dove down into NoSQL and I remember running into all sort of things inside the CouchDB source code and thinking: “That isn’t how I would do it.” That code review ended up giving me so many ideas that I had to put them on paper (on keyboard, actually, I guess) and I wrote a whole series of blog posts on how to design a document database.

Just writing about it didn’t help, so I sat down and wrote some code. Some code turned into a lot of code, and that ended up being RavenDB.

And I can trace it all back to sitting in a conference room in JAOO, listening to Joe speak and being blown away.

Thank you, Joe.

time to read 1 min | 72 words

I’m going to be in London at the beginning of June. I’ll be giving a keynote at Skills Matters as well as visiting some customers.

I have a half day and a full day slots available for consulting (RavenDB, databases and overall architecture). Drop me a line if you are interested.

I also should have an evening or two free is there is anyone who wants to sit over a beer and chat.

time to read 1 min | 90 words

I’m going to be talking at CodeNode in London on June 3rd.

The topic of this talk is a few bugs that we found in the CoreCLR framework, the JIT and even the Operating System, how we tracked them down and put them down.

I have blogged about some of these in the past, and I’m looking forward to giving this talk. You can expect a talk that ranges between WinDBG walk throughs and rants about the memory model assumptions made on a different platform six years ago.

time to read 2 min | 330 words

I spent the last couple of days in the O’Reilly Architecture Conference and HIMSS (Healthcare Information and Management Systems Society) Conference. During that time, I had the chance of listening to quite a few technical marketing spiels.

Some of them were technically very impressive, but missed the target by a planet or two. I came up with a really nice analogy for how such presentations do a great disservice for their purpose.

Consider the following:

This non-steroidal drug has been clinically tested and FDA approved will cease the production of prostaglandins and has a significant antiplatelet effect. It’s available in tablet and syrup forms and is suitable for IVs. May cause diarrhea and/or vomiting.

This is factual (at least as much as I could make it), I assume that if you are a medical professional you might be able to work out possible uses for this drug. But the most important thing that is missing from this description? What does this do?

This is Ibuprofen and you take it to ease your headache (among many other uses). It can also protect help you avoid blood clots.

I intentionally chose this example, because it is a very obvious one (and I just came back hearing way too much medical stuff). You begin by telling me how this will ease the pain. In many ways, I consider technical marketing to be composed of the following steps:

  • Whatever this product can actually ease the pain.
  • Whatever this customer actually experience the pain.

For example, if you are promising to have a faster than light bullet-train to Mars,  that is going to cast some… doubt on your claims. On the other hand, it doesn’t matter to me if you can cut down my commute time in half if I can get to work while not leaving my house.

If the customer experienced the pain and believe that you can actually help there, you are most of the way there. All that is left is just negotiating, barrier removal, etc.

time to read 7 min | 1259 words

imageAfter my previous two posts on the topic, it is now time to discuss how we make money from Open Source Software. Before I start, I want to clarify what I’m talking about:

  • RavenDB is a document database.
  • It is about a decade old.
  • The server is released under the AGPL / commercial license.
    • We offer free community / developer licenses without any AGPL hindrance.
  • The RavenDB client APIs are licensed under the MIT license.
  • RavenDB (the product) is created by Hibernating Rhinos (the company).

I created RavenDB because I couldn’t not to. It was an idea that had to go out of my head. I looked up the details, and toward the end of 2008 I started to work on it as a side project. At the time I was involved in five or six active open source projects, just got my NHibernate Profiler product to a stable ground and was turning the idea of getting deeper into databases in my head for a while. So I sat down and wrote some code.

I was just doing some code doodling and it turned into deep design discussion and at some point I was actually starting to actively look for hep building the user interface for a “done” project. That was in late Feb 2010. Somehow, throwing some code at the compiler become over a journey that lasted over a year in which I worked 16+ hours days on this project.

Around Mar 2010 I knew that I had a problem. Continuing as I did before, just writing a lot of code and trying to create an OSS project out of it would eat up all my time (and money). The alternatives were actually making money from RavenDB or stop working on it completely. And I didn’t want to stop working on it.

I decided that I had to make an effort to actually make a product out of this project. And that meant that I had to sit down and plan how I would actually make money from it. I firmly believe that “build it, and they will come” is a nice slogan, but it doesn’t replace planning, strategy and (at least some) luck.

  • I already knew that I couldn’t sustain the project as a labor of love, and donations are not a sustainable way (or indeed, a way) to make money.
  • Sponsorship seemed like it would be unlikely unless I got one of my clients to start using RavenDB and then have them pay me to maintain it. That seemed… unethical, so wasn’t an option.
  • Services / consulting was something that I was already doing quite heavily, and was quite successful at it. But this is a labor intensive way of making money and it would compete directly with the time that it would take to build RavenDB itself.
  • Support is a model I really don’t like, because it put me in a conflict of interest. I take pride in what I do, and I wanted to make something that would be easy to use and not require support.
  • Open Core / N versions back – are models that I don’t like. The open core model often leaves out critical functionality (such as security) and the N versions back mean that you give the users you most want to have the best experience (since that would encourage them to give you money) the worst experience (here are all our bugs that we fixed but won’t give to you yet).

That left us with dual licensing as a way to make money. I chose the AGPL because it was an OSI approved license that isn’t friendly for commercial use, leading most users who want to use it to purchase a commercial license.

So far, this is fairly standard, I believe.

I decided that RavenDB is going to be OSS, but from most other aspects, I’m going to treat it as a commercial product. It had a paid team working on it from the moment it stopped being a proof of concept. It meant that we are intentionally set out to make our money on the license. This, in turn had a lot of implications. Support is defined as a Cost Center in Hibernating Rhinos. In other words, one of the things that we routinely do in Hibernating Rhinos is look at how we can reduce support.

One way of doing that, of course, is not have support, or staff the support team with students or the cheapest off shore option available. Instead, our support staff consists of decided support engineers and the core team that builds RavenDB. This serves several goals. First, it means that when you raise a support issue with us, you get someone who knows what they are doing. Second, it means that the core team is directly exposed (and affected by) the support issues that are raised. I have structured things in this manner explicitly because having an insight into actual deployment and customer behavior means that the team is directly aware of the impact of their work. For example, writing an error message that will explain some issue to the user matters, because it would reduce the time an engineer spends on the phone troubleshooting (not fun) and increases the amount of time they can sling code around (fun).

We had a major update between versions 3.5 and 4.0, taking almost 3 years to finish. The end result was a vastly improved performance, the ability to run on multiple platforms and a whole host of other cool stuff. But the driving force behind it all? We had to make a significant change to our architecture in order for us to reduce the support burden. It worked, and the need for support went down by over 80%.

Treating RavenDB as a commercial product from the get go, even though it had an OSS license, meant that we focused on a lot of the stuff that is mostly boring. Anything from docs, setup and smoothing out all the bumps in the road, etc. The AGPL was there as a way to have your cake and eat it too. Be an OSS project with all the benefits that this entails. Confidence from our users about what we do, entry to the marketplace, getting patches from users and many more. Just having the ability to directly talk to our community with the code in front of all of us has been invaluable.

At the same time, we sell licenses to RavenDB, which is how we make money. The idea is that we provide value above and beyond whatever it is our license cost, and we can do that because we are very upfront and obvious in how we get paid.

We have a few users who have chosen to go with the AGPL version and skip paying us. I would obviously rather get paid, but I have laid out the rules of the game when I started playing and that is certainly within the rules. I believe that we’ll meet these users as customers in the future, it isn’t really that different from the community edition which we offer freely. In both cases, we aren’t getting paid, but it expands our reach, which will usually get us more customers in the long run.

We have been doing this for a decade and Hibernating Rhinos currently has about 30 people working full time on it, so it is certainly working so far Smile!

time to read 6 min | 1149 words

imageRichard Stallman is, without a doubt, one of the most influential people on the Open Source movement. I think it is fitting in a post like this to look for a bit at some of his reasoning around what Open Source is.

When we call software “free,” we mean that it respects the users' essential freedoms: the freedom to run it, to study and change it, and to redistribute copies with or without changes. This is a matter of freedom, not price, so think of “free speech,” not “free beer.”

The essential freedoms he talks about are “users have the freedom to run, copy, distribute, study, change and improve the software”. That was the intent behind the GNU, the GPL and much of the initial drive for Open Source. Rather, to be exact, the lack of these freedoms drove a lot of the proponents of Open Source.

I find the philosophy behind Open Source is a good match to my own view in many respects. But it is really important to understand that the point behind all of this is the user’s freedom, not the developer. In many cases (such as developers tools), the distinction isn’t obvious, but it is an important one. The developer in this case is the entity that developed the software and released it. The user is whoever got hold of the software, however that was done.

As you might have noticed above, Stallman’s reasoning explicitly call out free speech and not free beer. In other words, nothing is constructed as to prevent or forbid paying for Open Source software. So far, this is great, but the problem with selling Open Source Software is that one of the essential freedoms is the ability to redistribute the software to 3rd parties. Let’s assume that party A is selling licenses for OSS project at 1,000$. The OSS license is explicitly okay with the first buyer of the software (party B) immediately turning around and selling the software for 500$. And the first buyer from party B to sell it onward for 250$, etc.

In practical terms, this means that you would expect the price of OSS projects to be near the distribution cost. When the GPL came about in 1986, that meant floppy disks as the primary mode of data transfer. There was a very real cost to distributing software, especially on mass. With today’s network, the cost of distributing software is essentially nil for most projects.

In my previous post on the topic, I mentioned that this cause a real problem for OSS projects. Building software projects cost a lot of time and money. Sometimes you get such projects that are funded for the “common good”. The Linux Kernel is one such project, but while other examples exists (jQuery, I believe), they are rare. If you want to make money and work on OSS projects, this isn’t really a good way to go about doing this.

If you want to make money and not do OSS, you are likely to run into a lot of pressure from the market. In many environments, being an OSS project gives you a huge leg up in marketing, users and mindshare. Conversely, not being OSS is a major obstacle for your intended users. This pretty much forces you toward an OSS business model, as described in my previous post.

A really interesting aspect of OSS business models is the use of the core principles of Open Source as a monetization strategy. Very rarely you’ll find that there is something that interesting / novel in a particular project. It is the sum of the individual pieces that make it valuable. Sometimes you do have projects with secret sauce that they want to protect, but for the most part, keeping the source closed isn’t done to hide something. It is done so you’ll be able to sell the software. Dual licensing with a viral license take a very different approach for the same problem.

Instead of keeping the source secret and selling licenses to that, you release your software under an OSS license, but one that require your potential customers to release their source code in turn. Remember how I said that most projects don’t have anything interesting / novel in them? That was from a technical point of view. From a business perspective, that is a a wholly different matter. And if you aren’t in the business of selling software, you probably don’t want to release your code (which include many sensitive details about your organization and its behavior).

An example that would hopefully make it clear is the Google ranking algorithm. Knowing exactly how Google ranks pages would be a huge boon to any SEO effort. Or, if you consider the fact that the actual algorithm probably wouldn’t make sense without the data that drives it, consider the case of a credit rating agency. Knowing exactly how your credit score is computed can allow to manipulate it, and the exact details matter. So you can take it for granted that businesses would typically want to avoid Open Sourcing their internal systems.

The dual licensing with a viral license utilize this desire to charge money for OSS projects. Instead of using the software under a viral OSS license,  customers pay to purchase a commercial license, which typically have non of the freedoms associated with Open Source Projects.

Here is the dichotomy at the heart of this business model. In order to make money from OSS projects, companies chose viral licenses so their users will pay to have less freedom (and its obligations). There Is No Such Thing As Free Lunch still applies.

Recent moves by Redis and MongoDB, for example, show how this apply in practice. Redis’ Common Clause prevent selling the software (directly or via SaaS) and MongoDB’s SSPL is used to prevent hosting of MongoDB by cloud providers without paying a license fee. The problems that both of them (and others) have run into is that new deployment models (SaaS in particular) has rendered the previous “protections” from viral licenses obsolete.

I find it refreshingly honest that Redis’ license change has explicitly acknowledged that this isn’t a Open Source license any longer. And SSPL was almost immediately classified as non OSI license. MongoDB seem to think it is meeting the criteria for an Open Source license, but the OSI seem to disagree with that.

I wrote this post (and had an interesting time researching OSS history and license discussions) to point out this dissonance between a license that has more freedom (as the GPL / AGPL are usually described) and being more limited in how you can use it in practice. This is long enough, so I’ll have a separate post talking about how we approach both licensing and making money from Open Source.

time to read 7 min | 1304 words

Open Source is a funny business model, first you give away the crown jewels, then you try to get some money back. I have been working on OSS projects for close to twenty years now. I have been making my living off of OSS projects for most of that time. It is a very interesting experience, because of a simple problem. After you gave away everything, what do you charge for? I wrote this post because of this article and this twit. The article talks about the Open Core model and how it is usually end up. The twit talks about the reaction of (some) people in the marketplace when they are faced with the unconscionable request to pay for software.

The root problem is that there are two very different forces at play here.

  1. Building software is expensive. And that is only the coding part*.
  2. There is a very strong expectation that software should be freely available.

* If you also need to do documentation, double that. If you need to do deployment (multi platform, docker, k8s, ), do that again. If you need to support multiple versions, you guess it. There is also the website, graphics, GDPR compliance and a dozen other stuff that you have to do if you want to go beyond the some code on GitHub repository stage. There is so much more to a software project than just slinging code, and most of these things are not fun to do and take a whole lot of time. Which means that you have to pay people to do so.

When I say very strong expectation, I mean just that. To the point where if the code isn’t available, it is a huge barrier to entry. So in practice, you usually have to open source the project, or at least enough of it to satisfy people.

Reading the last statement again, it sounds very negative, but that isn’t meant to be the case. A major advantage of being Open Source project is that you get a lot of credibility from potential users. To start with, people go in and go through your code. They do strange things like talk to you about it, offer advice, patches and pull requests. They make the software better. They also take your software and do some really amazing things with it. For the past decade and a half, my default position has been that software I write is opened by default. I have yet to regret that decision.

An OSS project can typically get a lot more traction than a closed sourced one, these days. Which create a lot of pressure to open source things. And that, in turn, lead us to a simple problem. How can you make money from OSS projects?

There are a few ways to do so:

Labor of love – in some cases, you have people who simply enjoy coding and sharing their work. The problem here is that eventually you’ll run out of time to donate to the project and have to find some means to pay for it.

Donations – this is how people typically imagine OSS projects are paid for. I have tried that a few times in the past, I don’t believe that I made enough money to go hit the movie theater midday.

Sponsorship (small) – sometimes a project is important enough for a company that they are willing to pay for it. That means either hiring the major contributors or paying them in some manner. This is a great way to get paid while working on what you are passionate for, especially because you can usually complete all the taxes that a project requires (from a website to the documentation).

Sponsorship (large) – I’m thinking about something like Apache, Linux foundation, etc. These typically reserved to stuff that is core infrastructure and trying to build something like that from scratch seems… hard.

Services / Consulting – I did that actively for several years. Going to customers, helping them integrate / customize various projects that I was involved in. It was a lot of fun, but also exhausting. It’s basically being a consultant, but you are focusing on a few projects. Here, OSS work is basically awesome for building your reputation and showing off your skills. You can build a business around that, but that require having a large number of users and it subject to the usual constraints of consulting companies. The most limiting of which is that the company is an charging some % of the costs of employees, and the % can’t be too high (otherwise the employees will just do that directly).

The common thread among all the options above? None of them are viable options if you have VC money. The problem with all of these options is that (even in the case of something like the Linux Kernel), the ROI just isn’t worth it.

So what can you do, if you believe that your project should be OSS (for marketing, political or strongly held believes reasons) and you want a business model that can show significant returns?

Support – Offer the project itself for free, but charge for support. For certain industries and environments, that works great. But it does suffer from a problem, if you don’t have to buy support, why would you? In such cases, usually there is a conflict of interest here. Making the software simpler and easier to use will cannibalize the support that the company relies on. Red Hat is a good example of that. Note that a large part of what Red Hat does is the grunge work. Back porting patches, ensuring compatibility, etc. The kind of things that needs to be done, but you won’t get people doing for fun. To my knowledge, however, there are very few, if any, examples of other companies that successfully monetize this approach.

Open Core – in this model, you offer the core pieces for all, but keep the features that matter most to the customers with the most money closed in some fashion. In a sense, this is basically what the Support model is doing, for customers who need support. GitLab, MySQL, Redis and Neo4J are common examples of open core models. The idea is that for development and small fries (people who would typically not pay much / at all) will get you the customers that will pay for the high end features. The idea here is to get people to purchase licenses, similar to how commercial software works.

N versions back – A more aggressive example of the open core model is simply having two editions. An open source one and a commercial one. The major difference is that the open source one is delayed behind the commercial one. Couchbase, for example, is licensed under such a model.

Dual licensing with viral license – in this model, the idea is that the code is offered under a license which isn’t going to be acceptable for the target customers. Leading them to purchase the commercial edition. This model also mandates that the company is able to dual license the code, so any outside contributions require a copyright assignment to the company.

Cloud hosting – in this model, the software itself is offered under OSS license, but the most common use case is to use the cloud offering (and thus pay for the software). WordPress is a good example of that. The idea is that while people can install your software on their own machines, paying the company to do that is the more cost effective solution.

I’m sure that I have skipped many other options for making money out of OSS projects, but the ones I mentioned seems to be the most popular ones right now. I got a lot more to talk about this topic, so there will be most posts coming.

time to read 1 min | 140 words

imageI’ll be in New York for all of next week, and I got a busy schedule.

I’m going to be in the O’Reilly Architecture Conference on the RavenDB booth, showing off our new features and how RavenDB can figure into your next application’s architecture.

If you are around, I would be delighted to meet you in any of these events.

time to read 5 min | 866 words

A large portion of my day to day tasks is to review code. I’m writing this post barely two weeks into the new year, and we already had over 150 PRs going into RavenDB alone.

As a result, I’ve gotten sensitive to certain issues. For example, the following is a suggestion made for fixing an issue in this method declaration:

image

This is a piece of code (in C) that is meant to handle some low level details for RavenDB. We use the CLR coding conventions for C#, but for C, we have chosen to use a different convention, using snake_case for methods, arguments and variables and SHOUTING_CASE for constants / defines. When reading through the code, I marked this violation of the naming convention for a fix.

This may seem minor, but it is probably annoying for the author of the code. They are interested in comments about the code functionality and utility. Why spend any time on something that doesn’t really have any impact? Both forms of the parameter name are just as readable to me, after all.

Before I get to this part, I want to show another piece of code. Or, to be rather more exact, two pieces of code. One of the reasons that we are using C code is that we can abstract large parts of the underlying platform inside the native code. That means that we have certain parts of the code that are written twice. Once for Windows and once for Linux.

Here is some code for Windows:

image

And here is the same code for Linux:

image

You can see that this is pretty much the same thing, just calling the different APIs for each platform. Once thing to notice here is that part of this method’s task is to ensure that the file that we open is at least as big as the initially requested size.

In Windows, to increase or decrease the file size you call SetFilePointer() followed by SetEndOfFile(). On Linux, you have fallocate() and ftruncate()*.This is reflected in the code. The Windows code has a single method to do this work and the Linux method has two methods. rvn_allocate_file_space() and rvn_truncate_file() which isn’t shown here.

* Actually, you might have fallocate(). Some file systems do not support it, and you need to use another workaround.

One of my code review comments has been that this need to be fixed, that we should have a _resize_file() method for Linux that would simple call the appropriate method based on the file size. But the question is, why?

These are two separate implementations for two separate operating systems. We are already creating the higher level abstraction level with operations that hide many system details. Why do I want to have a passthrough method just because the Windows code has this method?

The answer here, as in the case above with the parameter name, is the same. Consistency.

This it most obvious in the naming convention, but it is the same reasoning I had when I wanted to have the same method structure for both Linux and Windows.

Consistency is key for being able to slog through a lot of code. It is how I (and the rest of the team) can go through thousands of lines of code per week and understand what is going on. Because when we look at a piece of code, it follow certain conventions and structure. Reading the code is easy because we can ignore a lot of cruft around it and focus on what is going on.

In the case of the Windows / Linux methods, I first read one method and then the next, making sure that we are doing the same thing on all platforms. The different behavior (resize vs. allocate) was very obvious to me, which meant that I had to stop, go and look at each method’s implementation to figure out whatever there is any meaningful difference between them. That was a chore, and it would only become worse over time as we add additional functionality, so anything that isn’t different because it has to be different should match.

In general, I like code reviews where I can scan through the changes and not see the code, but it’s purpose. That happens when there isn’t anything there that I really have to think about and the intent is clear.

When writing in C#, we have decades (literally) of experience and organizational inertia that push us in the right direction. When push C code into the repository, I started to actually pay attention to those details explicitly, because I suddenly need to.

This is apparent in code reviews, but it isn’t just the case of me trying to make my own tasks easier. Code is read a lot more often than it is written, and focusing on making the code itself boring will pay off, because what the code is doing is what should be interesting in the long run.

FUTURE POSTS

  1. TimeSeries in RavenDB: Exploring the requirements - about one day from now

There are posts all the way to May 20, 2019

RECENT SERIES

  1. Reviewing Sled (3):
    23 Apr 2019 - Part III
  2. RavenDB 4.2 Features (5):
    21 Mar 2019 - Diffing revisions
  3. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  4. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
  5. Production postmortem (25):
    18 Feb 2019 - This data corruption bug requires 3 simultaneous race conditions
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats