Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 5 min | 860 words

imageRavenDB’s core philosophy is that It Just Works and that means that we try very hard to get things right. Conversely, that means that we are also trying to make it hard to do the wrong thing. Basically, we want to push you hard into the pit of success.

Part of that approach is what we call the governors. It is a set of features that will detect and abort known bad behavioral patterns.  I have already talked about Unbounded Result Sets and I recently run into this post, which shows how nasty a problem that can be, and how invisible.

Another governor we have in place is the session’s maximum request limit. A session is meant to be a scope, it has a very short duration and is typically used for a single request / processing a single message, etc. It is supposed to live as long as the business transaction. Because the session is scoped, we can reason that a single session that is making a lot of database operation is probably doing something pretty bad.

For example, it might be calling the database in a loop. Those kind of issues can be truly insidious. Let us look at the following code (taken from here):

image

image

This kind of thing is a silent performance killer. No one is likely to see this is happening, and it will silently increase the number of database operations that your application make, leading to increased DB load, higher page load times and all sort of problems associated with it.

In one particular case, I saw a single page load generate 17,000 queries to the database. The software in question grew over time, and people assumed that this was just it took to run the software. Their database server was a true monster (this was about a decade ago), with dedicated RAM disks, high CPU count and a truly ridiculous amount of memory. Just to explain, we are talking about something like this:

image

But a decade ago, and it had a quite a bit of space. Now, this kind of beasty can do 500K IOPS (I’m drooling just thinking about it), but it is damn expensive. Just to put things in perspective, I spent several weeks at that company working on this particular problem, the cost of those weeks of work didn’t even cover the cost of the drive on that machine.

And on that monster, we were seeing page load times in the tens of seconds, and extremely high system load. I was able to bring it down to about 70 queries per page load, and their database server has pretty much idled ever since (IIRC, they turn that machine into a VM host for all the rest of their software, actually).

This is something that can bite.

To avoid that, we have the max numbers of requests in the session, which will abort excessive amount of database chatter. This have two important effects:

  • It follow the “better let one bad request die rather than take down the entire application”.
  • It put a budget on the number of calls that you can make.

Now, that budget is actually really interesting. Because we have it, we need to think about how we can reduce the number of database calls that we have to process the request. That led to a whole bunch of features around that. Lazy requests, includes and transformers to name just a few.

That had a positive unintended consequence. RavenDB is fast,  really fast, but it is also typically deployed as a network database, that means that each database call actually go over the network, and we all remember our fallacies, right?

image

In our profiling, we found that most often, the real cost in a RavenDB application was the back & forth chatter with the database. Reducing the number of requests we make to the server has an immediate benefit. And RavenDB allows you to do that by pipelining requests with Lazy, predicting requests with Includes or running the whole thing on the server side with Transformers.

And, like all governors, you can control it, RavenDB allows you to decide what the limit should be (on that particular session or globally based on your actual needs and environment.

time to read 3 min | 492 words

imageRavenDB doesn’t provide any way for queries to do table scans*.

* That isn’t actually true, we have Data Exploration, which does just that, but we don’t provide an explicit API for it, and it is a DBA driven feature (I wanna get this report with a minimum of fuss without regards to how much it is going to cost me) than an API that is exposed.

What this means is that the cost of query operations in RavenDB is always going to be O(logN), instead of O(N). How does this relate to the topic of RavenDB retrospectives?

One of the things that I kept seeing over and over as a database consultant was that databases are complex, and that it is easy to write a query that works perfectly fine for a period of time, then fall over completely as the size of the data goes over a certain threshold. In particular, queries that use table scans are particularly vulnerable for this issue.

One of the design goals for RavenDB was to avoid that, completely. We did it by simply forbidding any query that doesn’t have an index. initially, that was a pretty annoying requirement, because every time that you needed a new query, you needed to go ahead and create an index. But early on we got the Auto Indexes feature.

Basically, it means that when you can query RavenDB without specifying which index you want to use, at which point the query optimizer will inspect the query and decide which index can serve it. The most interesting point here is that if there isn’t an index that can serve this query, the query optimizer is going to create one on the fly. See the previous post about BASE indexes and how we can afford to do that.

The fun part here is that the query optimizer is actually learning over time, and it will shape its indexes to best fit the kind of queries you are doing. It also makes RavenDB much more robust for New Version Degradation effects. NVD is what happens when you push a new version out, which have slightly different queries, which make previously used indexes ineffective, forcing all your queries to become full table scans. Here is an example of the kind of subtle issues that this can cause. With RavenDB, when you use auto indexes (in other words, when you don’t explicitly state which index to use), the query optimizer will take care of that, and it will create all the appropriate indexes (and retire the unused ones)  for you.

This in particular is a feature that I’m really proud of, it require very little from the user to work with, and it gets the Right Thing Done.

time to read 7 min | 1257 words

RavenDB was designed from the get go with ACID documents store, and BASE indexes. ACID stands for Atomic, Consistent, Isolated, Durable, and BASE stands for Basically Available, Soft state, Eventually consistent.

That design had been conceived by twin competing needs. First, and obvious, a database should never lose data. Second, we want to ensure that the system remains responsive even under load. It is quite common to have spike in production traffic, and we wanted to be able to be able to handle it with better aplomb.

In particular, the kind of promises that are made by RavenDB queries allow us to perform quite a few performance optimizations. In databases that require that all indexes will be up to date on transaction commit, you’ll find that there is a very high cost to adding indexes to the system, because each additional index means additional work is needed at query time. It also makes things such as aggregating indexes (map/reduce, in RavenDB terms) a lot harder to build.

By having BASE indexes, we gain the ability to batch multiple writes into a single index update operation. It also allows us to defer writing the indexes to the disk, avoiding costly I/O operations. But most importantly, by changing the kind of promise that we give to users, we are able to avoid a lot of locks, complexity and hardship inside RavenDB. This may seems like a small thing, but this is actually quite important. Take a look at this study:

image

In fact, there are a lot of studies on the overhead of locking in database systems, and that has been a hot research topic for many years. By choosing a different architecture, we can avoid a lot of those costs and complexities.

So far, that was the explanation from the point of view of the database creator. What about the users?

Here the tradeoff is more nuanced. On the one hand, there is a certain level of complexity that people have to deal with the notion that queries on just inserted data might not include it (stale queries), on the other hand, it means that queries are consistently faster and we can handle spikes in traffic and load much more easily and consistently.

But it is a mental model that can be hard to follow, even when you are familiar with it. Probably the most common issue with RavenDB’s BASE indexes is the case of Post / Redirect / Get. Let us look at how this may play out:

In here, we actually have two requests, one that adds a new order to the system, and the other that fetch the details. If you have redirected to the new order page, everything is going to work as expected, and you won’t notice anything even if the indexes are stale at the time of the request. But a pretty common scenario is to add the new order, and then go and look at the list of orders for this customer, and if the index didn’t have the chance to update between those two requests (which typically happen very quickly) then the customer will not see the new order.

That particular scenario is responsible for the vast majority the pain we have seen from our users around BASE indexes.

Now, one of the great things about BASE indexes is that the user get to choose whatever they want to wait for the up to date results or whatever they want whatever is there right now. And we have had mechanisms to control this at a very granular level (including options for personal consistency control, so different customers will have different waits depending on their own previous behavior). But we have found that this is something that puts a lot of responsibility on the developer to control the flow on their users on their applications.

So in RavenDB 3.5 we have changed things a bit. Now, instead of processing the write requests as soon as possible, you can ask for the server to wait until the relevant indexes has processed:

image

In other words, when you call SaveChanges, it will wait until the indexes has been updated, so when you return from the call, you can be certain that the results of any future queries will include all the changes on that transaction. This moves the responsibility to the  write side and make such scenarios much easier to handle.

Given all of that, and our experience with RavenDB for the past 8 years or so, we spiked how it would look like with ACID indexes, at least for certain things. The problem is that this pretty much takes out of the equation a lot of the power and flexibility that we get from Lucene (more on why you can’t do that in Lucene in a bit) and force us to offer what are essentially B+Tree indexes. Those are so limited that we would have to offer:

  • B+Tree indexes – ACID (simple property / range queries). With different indexes needed for different queries and ordering options.
  • Lucene indexes – BASE, full text, spatial, facets, etc queries. Much more flexible and easy to use.
  • Map/reduce indexes – BASE (because you aren’t going to run the full map/reduce during the original transaction).

The problem is that then we would have continuous burden of explaining when to use which index type, and how to deal with the different limitations. It will also make it much more complex if you have a query that can use multiple indexes, and there are problems associated with creating new ACID indexes on live systems. So it would generate a lot of confusion and complexity to users, for fairly small benefit that we can address already with the “wait on save” option.

As for why we can’t do it all via Lucene anyway, the problem is that this wouldn’t be sustainable. Lucene isn’t really meant for individual operations, it shines when you push large amount of data through it. It also doesn’t really have the facilities to be transactional, we have actually solved that particular problem in RavenDB 4.0, but it was neither pretty nor easy, and it doesn’t alleviate the issue of “we do best in large batches”. RavenDB’s BASE indexes are actually designed to take advantage of that particular aspect. Because under load, we’ll process bigger batches are reap the performance benefits that they bring.

BASE indexes also make for much simpler operations. You can define a new index without fearing locking the database, and it enables scenarios such as side by side indexing to update index definitions without impacting the running system.

Finally, a truly massive benefit of BASE indexes is that they allow us to change the following statement: more indexes means faster reads, slower writes. Fewer indexes means slower reads, faster writes. By movng the actual indexing work to a background task, we let the writes go though as fast as tehy possible can.

Indexes still have a cost, and the more indexes you have, the higher the cost (we still got to do some work here). But in the vast majority of the cases, we can squeeze this kind of work between writes, in times that the database would be idling. 

What that means is that you can have more indexes at the same cost, and that your queries are going to be using those indexes and are going to be fast.

 

time to read 4 min | 750 words

Image result for team retrospectiveWe spent some time recently looking into a lot of our old design decisions. Some of them make very little sense today (json vs. blittalbe as a good example), but made perfect sense at the time, and were essential to actually getting the product out.

Some of those design decisions, however, are still something that I very firmly believe in.  This series of posts is going to explore those decisions, their background and how they played out in the real world. So, without further ado, let us talk about unbounded result sets.

The design of RavenDB was heavily influenced by my experience as That NHibernate Guy (I got started with NHibernate over a decade ago, if you can believe that), where I saw the same types of error, repeated over and over again. I then read Release It!, and I suddenly discovered that I wasn’t alone fighting those kind of demons. When I designed RavenDB, I set out explicitly to prevent as many of those as I possibly could.

One of the major issues that I wanted to address was Unbounded Result Sets, simply put, this is when you have:

SELECT * FROM OrderLines WHERE OrderID = 1555

And you don’t realize that this order has three million line items (or, which is worst, that most of your orders have a few thousands line items, so you are generating a lot of load on the database, only to throw most of them away).

In order to prevent this type of issue, RavenDB has the notion of mandatory page sizes.

  • On the client side, if you don’t specify a limit, we’ll implicitly add one (by default set to 128).
  • On the server side, there is a database wide maximum page size (by default set to 1024). The server will trim all page sizes to the max if they are larger.

I think that this is one of the more controversial decisions in RavenDB design, and one that got a lot of heated discussion. But I still think that this is a good idea,because I have seen what happens when you don’t do that.   And the arguments are mostly about “RavenDB should trust developers to know what they are doing” and a particular irate guy called me while I was out shopping to complain how I broke the sacred contract of Linq with regards to “queries should return all by default, even if this is ten billion results”. I pointed out that this is actually configurable, and if he wanted to set the default to any size he wanted, he could do that, but apparently it is supposed to be “shoot my own foot first, then think” kind of deal.

Even though that I still think that this is a really good idea, we have added some features over the years to make it easy for people to access the entire dataset when they need it. Streaming has been around since 2.5 or so, giving you a dedicated API to stream unbounded results. Streams were built to make it efficient to process large sets of data, and they allow both client & server to process the data in parallel, instead of batching huge responses on the server, then consuming ridiculous amounts of memory on the client before giving you the full result set. Instead, you can get each result as soon as it arrive from server, and you can process it and send it further.

In 4.0, we are going to change the behavior of the paging limits so:

  • If you don’t specify a limit, we’ll supply a limit clause of 25 items. If there are more than 25 items, we’ll throw an exception (unless you asked otherwise in the conventions).
  • If you supply a limit explicitly, it will work as expected and page through the data.

The idea is that we want to reduce the surprise for users, and that can give them the experience to draw upon early on. Another thing that we’ll do is make sure that the operations guys can also change that, likely with an environment variable or something like that. If you need to modify the conventions on the fly, you usually have hard time deploying a new version, and an immediate action is needed.

In this manner, we can help users avoid expensive requests to the server, and they can be explicit with what they need to do.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}