Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 422 words

We are currently investigating the usage of LevelDB as a storage engine in RavenDB. Some of the things that we feel very strongly about is transactions (LevelDB doesn’t have it) and performance (for a different definition of the one usually bandied about).

LevelDB does have atomicity, and the rest of CID can be built atop of that without too much complexity (already done, in fact). But we run into an issue when looking at the performance of reading. I am not sure if that is unique or not, but in our scenario, we typically deal with relatively large values. Documents of several MB are quite common. That means that we are pretty sensitive to memory allocations. It doesn’t help that we have very little control on the Large Object Heap, so it was with great interest that we looked at how LevelDB did things.

Reading the actual code make a lot of sense (more on that later, I will probably go through a big review of that). But there was one story that really didn’t make any sense to us, reading a value by key.

We started out using LevelDB Sharp:

Database.Get("users/1");

This in turn result in the following getting called:

image

A few things to note here. All from the point of view of someone who deals with very large values.

  • valuePtr is not released, even though it was allocated by us.
  • We copy the value from valuePtr into a string, resulting in two copies of the data and twice the memory usage.
  • There is no way to get just partial data.
  • There is no way to get binary data (for example, encrypted)
  • This is going to be putting a lot of pressure on the Large Object Heap.

But wait, it actually gets better. Let us look at the LevelDB method that get called:

image

So we are actually copying the data multiple times now. For fun, the db->rep->Get() call also copy the data. And that is pretty much where we stopped looking.

We are actually going to need to write a new C API and export that to be able to make use of that in our C# code. Fun, or not.

time to read 5 min | 850 words

By far the most confusing feature in RavenDB has been the index’s Transform Result. We introduced this feature to give the user the ability to do server side projections, including getting data from other documents.

Unfortunately, when we introduced this feature, we naturally added it to the index, and that cause a whole lot of confusion. In particular, people seemed to have a very hard time distinguishing between what get indexed and is searchable and the output of the index. To make matters worse, we also had major issues with how to determine the input of the TransformResults function. In short, the entire thing works, but from the point of view of an external user, that is really something that is very finicky and hard to get.

Instead, during Rob’s sprint, we have introduced a totally new concept. Stand-alone Result Transformers.

Here is what they look like:

public class OrdersStatsTransfromer : AbstractTransformerCreationTask<Order>
{
    public OrdersStatsTransfromer()
    {
        TransformResults = orders =>
                           from order in orders
                           select new
                           {
                               order.OrderedAt,
                               order.Status,
                               order.CustomerId,
                               CustomerName = LoadDocument<Customer>(order.CustomerId).Name,
                               LinesCount = order.Lines.Count
                           };
    }
}

And yes, they are quite intentionally modeled to be very similar to the way you would define them up to now, but outside of the index.

Now, why is that important? Because now you can apply a Transform Results on the server side without being tied to a customer.

For example, let us see how we can make use of this new feature:

var customerOrders = session.Query<Order>()
    .Where(x => x.CustomerId == "customers/123")
    .TransformWith<OrdersStatsTransfromer, OrderViewModel>()
    .ToList();

This separation between the result transformer and the index means that we can apply it to things like automatic indexes as well.

In fact, we can apply it during load:

var ovm = session.Load<OrdersStatsTransfromer, OrderViewModel>("orders/1");

There are a whole bunch of other goodies in there, as well. We made sure that now you don’t have to worry about the inputs to the transform. We will automatically use the right values when you access them, based on whatever you stored the field in the index or if it is accessible on the document.

All in all, this is a very major step forward, and it makes it drastically easier to use Result Transformers in various ways.

time to read 2 min | 355 words

RavenDB’s query optimizer is pretty smart, it knows how to find the appropriate index for your queries, and even create a new index to match your query if it didn’t exist. But that was the limits of its abilities. A human could still go into the database and say, look at those:

image

Those all operate on Posts, and you should be able to merge them all into a single index. Reducing the number of indexes is a good thing, as it reduces the amount of IO on the system, which is typically our limiting factor.

Now, there was no real reason why we couldn’t actually tell the query optimizer that it should be smart enough that when it creates a new index, it will use all of the properties that have been previously indexed.

However, doing so would actually make no difference to us. Because until now, we didn’t have a way to stop an index. With the new index idling feature, we can now have the query optimizer create a new merged index, and then the database will just mark the extra index as idle after a while.

Almost, there is still another issue that we have to resolve. What happens when we have a big database, and we introduce a new (and wider) index? By default, all matching queries would actually hit that index, and not the previously existing index. That is great, except… the new index is stale, and might remain stale for a few minutes. During that time, we have a perfectly servicable index that is just sitting there.

The query optimizer can now take into account the staleness level of an index as well when selecting it, meaning that there should be no interruption from the point of view of other queries. The new index will be introduced, go through all the documents, and then take over as the serving index for all queries. The existing index will wither away and die.

time to read 1 min | 174 words

RavenDB previously had a really nice feature for temporary indexes. Since we expected most of them to be temporary, we indexed them directly into memory, greatly saving in IO costs. With the removal of temporary indexes, that left us the option of just removing the entire code path and moving on to other things.

But we sat down and thought about this for a while. Typically, the busiest part in the index’s life is its creation, because the database needs to go through all the documents in the db and index them. We have changed things so during this creation period, we will actually index to memory, without hitting the disk. Only if we reached a configurable size or finished indexing everything will we spill everything to disk.

This, in turn, gives us the best of both worlds. We get a really nice optimization for new indexes, and we don’t have to hit the disk for indexes that would soon go away. And, of course, we get the perf boost for all indexes now.

time to read 2 min | 357 words

RavenDB’s ability to analyze your queries and generate the required indexes on the fly has always been a great boon. Rob Ashton was involved in the original implementation and during his visits to Hibernating Rhinos’ secret lair, he got to whack that thing on the head a few more times.

We need to separate two important things:

  • Automatically generating the indexes based on your queries.
  • The temporary indexes model itself.

The first part is a really important feature. The second is just an implementation detail. In particular, temporary indexes had a few problems.

Most importantly, they were temporary, and there was an explicit step for promoting those indexes from one stage to the other. That caused some confusion, and there was a period of time, exactly when we decided that the index was important enough to keep, that caused the index to effectively reset itself. The other problem was that the moment that an index was upgraded to an auto index, it was there forever.

What Rob has done was to remove the concept of temporary indexes all together, which got rid of a whole bunch of code. Instead, we have just standard auto indexes. And now we had a drastically simplified story. We didn’t have the drastic jump from temp to auto, with irrecoverable implications.

Of course, this leads to a lot of interesting questions. Temporary indexes had the benefit of being indexed directly to memory, and they would go away after a database restart, as well as a whole lot of stuff. Not having special code for that made things a lot simpler for us, actually.

Automatic indexes have their age, and that is tracked internally by RavenDB. If an automatic indexed isn’t being used, it will become idle an eventually abandoned. If it is a very young index, we will decide it was a temporary index after all, and remove it from the system completely.

This feature, along with idling indexes, opened up the door for the next important feature, index merging. But before that, we need to upgrade the smarts for the query optimizer… which happens to be our next topic.

time to read 2 min | 359 words

During Rob Ashton’s visit to our secret lair, we did some work on hard problems. One of those problems was the issue of index prioritization. As I have discussed before, this is something that isn’t really easy to do, because of the associated IO costs with not indexing properly.

With Rob’s help, we have the defined the following:

  • An auto index can be set to idle if it hasn’t been queried for a time.
  • An index can be forced to be idle by the user.
  • An index that was automatically set to idle will be set to normal on its first query.

What are the implications for that? And idle index will not be indexed by RavenDB during the normal course of things. Only when the database is idle for a period of time (by default, about 10 minutes with no writes) will we actually get it indexing.

Idle indexing will continue indexing as long as there is no other activity that require their resources. When that happens, they will complete their current run and continue to wait for the database to become idle again.

But wait, there is more. In addition to introducing the notion of idle indexes, we have also created another two types of indexes. The first is pretty obvious, the disabled index will use no system resources and will never take part in indexing. This is mostly there so you can manually shut down a single index. For example, maybe it is a very expensive one and you want to stop it while you are doing an import.

More interesting, however, is the concept on an abandoned index. Even idle indexes can take some system resources, so we have added another level beyond that, an abandoned index is one that hasn’t been queried in 72 hours. At that point, RavenDB is going to avoid indexing it even during idle periods. It will still get indexed, but only if there has been a long enough time passed since the last time it was indexed.

Next, we will discuss why this feature was a crucial step in the way to killing temporary indexes.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}