Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,624 | Comments: 48,349

filter by tags archive

Roadmap for RavenDB 4.1

time to read 2 min | 227 words

imageWe are gearing up to start work on the next release of RavenDB, following the 4.0 release. I thought this would be a great time to talk about what are the kind of things that we want to do there. This is going to be a minor point release, so we aren’t going to shake things up.

The current plan is to release 4.1 about 6 months after the 4.0 release, in the July 2018 timeframe.

Instead, we are planning to focus on the following areas:

  • Performance
    • Moving to .NET Core 2.1 for the performance advantages this gives us.
    • Start to take advantage of the new features such as Span<T>, etc in .NET Core 2.1.
    • Updating the JavaScript engine for better query / patch performance.
  • Wild card certificates via Let’s Encrypt, which can simplify cluster management when RavenDB generates the certificates.
  • Restoring highlighting support

We are also going to introduce the notion of experimental features. That is, features that are ready from our perspective but still need some time out in the sun getting experience in production. For 4.1, we have the following features slated for experimental inclusion:

  • JavaScript indexes
  • Distributed counters
  • SQL Migration wizard

I have a dedicated post to talk about each of these topics, because I cannot do them justice in just a few words.

Open sourcing code is a BAD default policy

time to read 11 min | 2011 words

imageI run into this Medium post that asks: Why is this code open-sourced? Let’s flip the question. The premise of the post is interesting, given that the author asks that the default mode for code is that it should be open source. I find myself in the strange position of being a strong open source adherent that very strong disagree on pretty much every point in this article. Please sit tight, this may take a while, this article really annoyed me.

Just to clear the fields, I have been working on open source software for the past 15 years. The flagship product that we make is open source and available on GitHub and we practice a very open development process. I was also very active in a number of high profile open source projects for many years and had quite a few open source projects that I had built and released on my own. I feel that I’m quite qualified to talk from experience on this subject.

The quick answer for why the default for a codebase shouldn’t be open source is that it costs. In fact, there are several very different costs around that.

The most obvious one is the reputation cost for the individual developer. If you push bad stuff out there (like this 100+ lines method) that can have a real impact on people perception on you. There is a very different model for internal interaction inside the team and stuff that is shown externally, without the relevant context. A lot of people don’t like this exposure to external scrutiny. That leads to things like: “clean up the code before we can open source it”.  You can argue that this is something that should have been done in the first place, but that don’t change the fact that this is a real concern and add more work to the process.

Speaking of work, just throwing code over the wall is easy. I’m going to assume that the purpose isn’t to just do that. The idea is to make something useful and that means that aside from the code itself, there is also a lot of other aspects that needs to be handled. For example, engaging the community, writing documentation, ensuring that the build process can run on a wide variety of machines. Even if the project is only deployed on Ubuntu 16.04, we still need to update the build script for that MacOS guy. Oh, this is open source and they sent us a PR to fix that. Great, truly. But who is going to maintain that over time?

Open source is not an idyllic landscape that you  dump your code in and someone else is going to come and garden it for you.

And now, let me see if I can answer the points from the article in detail:

  • Open-source code is more accessible - Maintainers can get code reviews … consumers from anywhere in the world … can benefit from something I was lucky enough to be paid for building.

First, drive by code reviews are rare. As in, they happen extremely infrequently. I know that because I do them for interesting projects and I explicitly invited people to do the same for my projects and had very little response. People who are actually using the software will go in and look at the code (or some parts of it) and that can be very helpful, but expecting that just because your code is open source you’ll get reviews and help is setting yourself for failure.

There is also the interesting tidbit there about consumers benefiting from something that the maintainers were paid to build. That part is a pretty important one. Because there is a side here in the discussion that hasn’t been introduced. We had maintainers and consumers, but what about the guy who end up paying the bills? I mean, given that this is paid work, this isn’t the property of the maintainer, it belongs to the people who actually paid for the work. So any discussion on the benefits of open sourcing the code should start from the benefits for these people.

Now, I’m perfectly willing to agree (in fact, I do agree, since my projects are in the open) that there are good and valid reasons to want to open source a project and community feedback is certainly a part of that. But any such discussion should start with the interests of the people paying for the code and how it helps them. And part of that discussion should involve the real and non trivial costs of actually open sourcing a project.

  • Open-source code keeps us healthy - Serotonin and Oxytocin are chemicals in the brain that make you feel happy and love. Open source gives you that.

I did a bad job summarizing this part, quite intentionally. Mostly because I couldn’t quite believe what I was reading. The basic premise seems to be that by putting your code out there you open yourself to the possibility of someone seeing your code and sending you a “Great Job” email and making your day.

I… guess that can happen. I certainly enjoy it when it happens, sure. Why would I say no to something like that?

Well, to start with, it happens, sure, but it isn’t a major factor in the decision making process. I’ll argue that if you think that compliments from random strangers are so valuable, just get in and out of Walmart in a loop. There are perfect strangers there that will greet you every single time. Why wouldn’t you want to do that?

More to the point, even assuming that you have a very popular project and lots of people write you how awesome you are, this gets tiring fast. What is worse is you throwing code over the wall and expecting the pat in the back. But no one care, actually getting them to care takes a whole lot of additional work.

And we haven’t even mentioned that other side of open source project. The users who believe that just because your code is open source they are entitled for all your time and effort (for free). And you are expected to fix any issues they find (immediately, of course) and are quite rude and obnoxious. There aren’t a lot of them, but literally any open source project that has anything but the smallest of following will have to handle them at some point. And often dealing with such a disappointed user means dealing with abuse. That can be exhausting and painful.

Above I pointed out a piece of code in the open that is open to critique. This is a piece of code that I wrote, so I feel comfortable telling you that it isn’t so good. But imagine that I took your code and did that? If is very easy to get offended by this, even when there was no intent to offend.

  • Open-source code is more maintainable – Lots of tools are free for OSS projects

So? This is only ever valuable if you assume that tooling are expensive (they aren’t). The article mentions tools such as Travis-CI, Snyk, Codecov and Dependencies.io that are offering free tier for open source projects. I went ahead and priced these services for a year for the default plans for each. The total yearly cost of all of them was around $8,000. That is a lot of money. But that is only assuming that you are an individual working for free. Assuming that you are actually getting paid, the cost of such tools and services is miniscule compared to other costs (such as developer salaries).

So admittedly, this is a very nice property of open source projects, but it isn’t as important as you might imagine it would be. In a team of five people, if the effort to open source the project is small, only taking a couple of weeks, it will take a few years to recoup that investment in time (and I’m ignoring any additional effort to run the open source portion of the project).

  • Open-source code is a good fit for a great engineering culture

Well, no. Not really. You can have a great engineering culture without having open source and you can have a really crappy engineering with open source. They sometimes go in tandem, but they aren’t really related. Investing in the engineering culture is probably going to be much more rewarding for a company that just open sourcing projects. Of particular interest to me is this quote:

Engineers are winning because they can autonomously create great projects that will have the company’s name on it: good or bad…

No, engineers do not spontaneously create great projects. That come from hard work, guidance and a lot of surrounding infrastructure. Working in open source doesn’t meant that you don’t need coordination, high level vision and good attention for detail. This isn’t a magic sauce.

What is more, and that is really hammering the point home: good or bad. Why would a company want to attach it’s name to something that can be good or bad? That seems like a very unnecessary gamble. So in order to avoid publicly embarrassing the company, there will be the need to do the work to make sure that the result is good. But the alternative to that is not to have a bad result. The alternative to that is to not open source the code.

Now, you might argue that such a thing is not required if the codebase is good to begin with, and I’ll agree. But then again, you have things like this that you’ll need to deal with. Also, be sure that you cleaned up both the code and the commit history.

  • Just why not

The author goes on to gush about the fact that there are practically no reasons why not to go open source, that we know that projects such as frameworks, languages, operating systems and databases are all open source and are very successful.

I think that this gets to the heart of the matter. There is the implicit belief that the important thing about an open source project is the code. That couldn’t be further from the truth. Oh, of course, the code is the foundation of the project, but foundations can be replaced (see: FireFox, OpenSsl –> BoringSsl, React, etc).

The most valuable thing about an open source project is the community. The contributors and users are the thing that make a project unique and valuable. In other words, to misquote Clinton, it’s the community, stupid. 

And a community doesn’t just spring up from nowhere, it takes effort, work and a whole lot of time to build. And only when you have a community of sufficient size will you start to see actual return of investment for your efforts. Until that point, all of that is basic sunk cost.

I’m an open source developer, pretty much all the code I have written in the past decade or so is under one open source license or another and is publicly available. And with all that experience behind me I can tell you what really annoyed me the most about this article. It isn’t an article about promoting open source. It is an article that, I feel, promotes just throwing code over the wall and expecting flowers to grow. That isn’t the right way to do things. And it really bugged me that in all of this article there wasn’t a single word about the people who actually paid for this code to be developed.

Note that I’m not arguing for closed source solutions for things like IP, trade secrets, secret sauce and the like. These are valid concerns and needs to be addressed, but that isn’t the issue. The issue is that open sourcing a project (vs. throwing the code to GitHub) is something that should be done in a forthright manner. With clear understand of the costs, risks and ongoing investment involved. This isn’t a decision you make because you don’t want to pay for a private repository on GitHub. 

Self contained deployments and embedded RavenDB

time to read 3 min | 496 words

imageIn previous versions of RavenDB, we offered a way to run RavenDB inside your process. Basically, you would reference a NuGet package and will be able to create a RavenDB instance that run in your own process. That can simplify deployment concerns immensely and we have a bunch of customers who rely on this feature to just take their database engine with their application.

In 4.0, we don’t provide this ability OOTB. It didn’t make the cut for the release, even though we consider this a very important utility feature. We are now adding this in for the next release, but in a somewhat different mode.

Like before, you’ll be able to do a NuGet reference and get a document store reference and just start working. In other words, there is no installation required and you can create a database and start using it immediately with zero hassle.

The difference is that you’ll not be running this inside your own process, instead, we’ll create a separate process to run RavenDB. This separate process is actually slaved to the parent process, so if the parent process exits, so will the RavenDB process (no hanging around and locking files).

But why create a separate process? Well, the answer to that is quite simple, we don’t want to force any dependency on the client. This is actually a bit more complex, though. It isn’t that we don’t want to force a dependency as much as we want the ability to change our own dependencies.

For example, as I’m writing this, the machine is currently testing whatever .NET Core 2.1 pops up any issues with RavenDB and we are pretty good with keeping up with .NET Core releases as they go. However, in order to do that, we need a wall between the client and the server code, which we want to freely modify and play with (including changing what frameworks we are running on and using). For the client code, we’ve a well defined process and the versions we support, but for the server, we explicitly do not define this as an implementation detail. One of the nice things about .NET Core is the it allows the deployment of self contained applications, meaning that we can carry the framework with us, and not have to depend on whatever is installed on the machine. This makes services and deployment a lot easier.

There is also the issue of other clients. We have clients for .NET, JVM, Go, Ruby, Node.JS and Python and we want to give users in all languages the same experiences of just bundling RavenDB and running it with zero hassles. All of that leads us to spawning a separate process and creating a small protocol for the host application control the slaved RavenDB instance. This will be part of the 4.1 release, which should be out in about three months.

Looking for Go / Node.JS / Ruby developers for a short stint

time to read 1 min | 107 words

We are looking to finish work on our client API for RavenDB in Go, Node.JS and Ruby. The current state is that the code is mostly there, and we need help to give it the final push (and spit & polish) and drive it to release status.

For each of these, I estimate that there is about 6 – 8 weeks of work, after which we’ll be managing the rest of this internally. You can see the current state of each client API here:

If you are interested, please send an email to jobs@ravendb.net, this is applicable to both local (Hadera, Israel) or remote work.

Inside RavenDB 4.0Chapter 17 is done

time to read 1 min | 156 words

You might have noticed that I’ve slowed down writing blog posts. This is because pretty much every word I write these days goes into the book.

I just completed chapter 17, and we are now standing at around 550 pages in length, and there is only one chapter left.

Chapter 17 talks about backup and restores, how they work in RavenDB and how to properly manage your backup strategies in RavenDB. It sounds deathly dull, but it can actually be quite interesting, since backing up of a distributed database (and restoring one, which is harder) and non trivial problems. I hope that I did justice to the topic.

Next, maybe even as soon as early next week is Chapter 18, operational recipes, which will cover all sort of single use case response from the operations team to how to deal with various scenarios inside RavenDB.

You can read the draft here and your feedback is always appreciated.

Times are hard

time to read 2 min | 277 words

One of the things RavenDB does is allow you to define a backup task that will be executed on a given schedule (such as every Saturday at midnight). However, as it turns out, specifying the right time is actually a pretty hard thing to do. The problem is what to do when you have multiple time zones involved:

  • UTC
  • The server local time
  • The operator’s local time
  • The business hours of the application using the database

In some cases, you might have a server in Germany being managed from Japan with users primarily from South Africa. There are at least four different options for when Saturday’s midnight is, and the one sure thing is that it will happen when you least want it to.

Because of that, RavenDB takes the simple positon that the time that it cares about is the server's own time. An operator is free to define it as they wish, but only the server local time is relevant. But we still need to make the operator’s job easier, and we do it using the following method:

image

The operator can specify the time specification using CRON syntax (which should be common to most admins). We translate the CRON syntax to a human readable string, but we also provide the next backup date with the server’s time (when it will actually run), the operator’s local time (which as you can see is a bit different from the server) and the duration. The later is actually really important because it gives the operator an intuitive understanding of when the backup is going to run next.

It was your idea, change the diaper

time to read 3 min | 470 words

imageYou learn a lot of things when talking to clients. Some of them are really fascinating, some of them are quite horrifying. But one of the most important things that I have learned to say to client is: “This is out of scope.”

This can be an incredibly frustrating thing to say, both for me and the client, but it is sometimes really necessary. There are times when you see a problem, and you know how to resolve it, but it is simply too big an issue to take upon yourself.

Let me give a concrete example. A customer was facing a coordination problem with their system, they need to deal with multiple systems and orchestrate actions among them. Let’s imagine that this is an online shop (because that is the default example) and you need to process and order and ship it to the user.

The problem at this point is that the ordering process need to coordinate the payment service, the fulfillment service, the shipping service, deal with backorders, etc. Given that this is B2B system, the customer wasn’t concerned with the speed of the system but was really focused on the correctness of the result.

Their desire, to have a single transaction encompass all such operations. They were quite willing to pay the price in performance for that, in order to achieve that goal. And they turned to us for help in this matter. They wanted the ability to persistently and transactionally store data inside RavenDB and only “commit” it at a given point.

We suggested a few options (draft documents, a flag in the document, etc), but we didn’t answer their core question. How could they actually get the transactional behavior across multiple operations that they wanted?

The reason we didn’t answer that question is that it is… out of scope. RavenDB doesn’t have this feature (for really good reasons) and that is clearly documented. There is no expectation for us to have this feature, and we don’t.  That is where we stop.

But what is the reason that we take this stance? We have a lot of experience in such systems and we can certainly help find a proper solution, why not do so?

Ignoring other reasons (such as this isn’t what we do), there is a primary problem with this approach. I think that the whole idea is badly broken, and any suggestion that I make will be used against us later. This was your idea, it broke (doesn’t matter you told us it would), now fix it. It is a bit more awkward to have to say “sorry, out of scope” ahead of time, but much better than having to deal with the dirty diapers at the end.

NHibernate Profiler and Entity Framework Profiler 5.0 RTM

time to read 1 min | 164 words

imageI’m really happy to announce that we have just release a brand new version of NHibernate Profiler and Entity Framework Profiler.

What is new in for NHibernate?

  • Support for NHibernate 5.1 and 5.1.1
  • Support for .NET Core
    • supported on the following platforms: netstandard2.0, net46, netcoreapp2.0
  • Fixed various minor issues regarding showing duplicate errors and warnings from NHibernate.
  • Better support for DateTime precision issues in NHibernate 5.0

What is new for Entity Framework:

  • Support for EF Core
    • supported on the following platforms: netstandard2.0, net46, netcoreapp2.0
    • netstandard1.6 is also supported via a separate dll.
  • Support for DataTable data type in custom reporting
  • Support for ReadCount and RecordsAFfected in EF Core 2.0
  • Fixed issue for EF 6 on .NET 4.7
  • Can report using thread name, not just application name
  • Provide integration hooks for ASP.Net Core to provide contextual information for queries.

New stuff for both of them:

  • Improved column type mismatch warning
  • Support for UniqueIdentifier parameters type
  • Support for integration with VS 2017.

Avoid a standalone DbService process

time to read 3 min | 520 words

imageThe trigger for this post is the following question in the RavenDB mailing list. Basically, given a system that is composed of multiple services (running as separate processes), the question is whatever have each service use its own DocumentStore or have a separate service (DbService) process that will encapsulate all access to RavenDB. The idea, as I understand it, is to avoid the DocumentStore creation because it is expensive.

The quick answer here is simple: <blink*>Don’t ever do that!</blink>

* Yes, I’m old.

That is all, you don’t need to read the rest of this post.

Oh, you are still here, as long as you are here, let me explain my reasoning for such a reaction.

DocumentStore isn’t actually expensive to create. In fact, for most purposes, it is actually quite cheap. It holds no network resources on its own (connection pooling is handled by a global pool, anyway). All it does is manage the http cache on the client, cache things like serialization information, etc.

The reason we recommend that you won’t create document stores all the time is that we saw people creating a document store for the purpose of using a single session and then disposing it. That is quite wasteful, it forces us to allocate more memory and avoid the use of caching entirely. But creating a few document stores for each service that you have? That is cheap to do.

What really triggered this post is the idea of having a separate process just to host the DocumentStore, the DbService process. This is a bad idea. Let me count the ways.

Your service process needs some data, so it will go to the DbService (over HTTP, probably) and ask for it. Your DbService will then call to RavenDB to get the data using the normal session and return the data to the original service. That service will process the data, maybe mutate it and save it back. It will have to do that by sending the data back to the DbService process, which will create a new session and save it to RavenDB.

This is adding another round trip to every database query, it means that you can’t natively express queries inside your service (since you need to send it to the DbService). It creates strong ties between all the services you have the the DbService, as well as a single point of failure. Even if you have multiple copies of DbService, you now need to write the code to do automatic failover between them. Updating a field in a class for one service means that you have to deploy the DbService to recognize the new field, for example.

In terms of client code, aside from having to write awkward queries, you also need to deal with serialization costs, and you have to write your own logic for change tracking, unit of work, etc.

In other words, this has all the disadvantages of a repository pattern with the added benefit of making many remote calls and seriously complicating deployment.

FUTURE POSTS

  1. Code that? It is cheaper to get a human - one day from now

There are posts all the way to May 21, 2018

RECENT SERIES

  1. RavenDB 4.1 features (3):
    11 May 2018 - Counting my counters
  2. Inside RavenDB 4.0 (9):
    08 May 2018 - The book is done
  3. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
  4. Challenge (52):
    03 Apr 2018 - The invisible concurrency bug–Answer
  5. RavenDB Security Review (5):
    27 Mar 2018 - Non-Constant Time Secret Comparison
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats