Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,928 | Comments: 49,415

filter by tags archive
time to read 2 min | 271 words

After a long journey, I have an actual data structure implemented. I only lightly tested it, and didn’t really do too much with it. In fact, as it current stands, I didn’t even implement a way to delete the table. I relied on closing the process to release the memory.

It sounds like a silly omission, right? Something that is easily fixed. But I run into a tricky problem with implementing this. Let’s write the simplest free method we can:

Simple enough, no? But let’s look at one setup of the table, shall we?

As you can see, I have a list of buckets, each of them point to a page. However, multiple buckets may point to the same page. The code above is going to double free address 0x00748000!

I need some way to handle this properly, but I can’t actually keep track of whatever I already deleted a bucket. That would require a hash table, and I’m trying to delete one Smile. I also can’t track it in the memory that I’m going to free, because I can’t access it after free() was called. So what to do?

I thought about this for a while, and I came up with the following solution.

What is going on here? Because we may have duplicates, we first sort the buckets. We want to sort them by the value of the pointer. Then we simply scan through the list and ignore the duplicates, freeing each bucket only once.

There is a certain elegance to it, even if the qsort() usage is really bad, in terms of ergonomics (and performance).

time to read 3 min | 582 words

The naïve overflow handling I wrote previously kept me up at night. I really don’t like it. I finally figured out what I could do to handle this in an elegant fashion.

The idea is to:

  • Find the furthest non overflow piece from the current one.
  • Read its keys and try to assign them to its natural location.
  • If successfully moved all non native keys, mark the previous piece as non overlapping.
  • Go back to the previous piece and do it all over again.

Maybe it will be better to look at it in code?

There is quite a lot that is going on here, to be frank. We call this method after we deleted a value and go a piece to be completely empty. At this point, we scan the next pieces to see how far we have to go to find the overflow chain. We then proceed from the end of the chain backward. We try to move all the keys in the piece that aren’t native to the piece to their proper place. If we are successful, we mark the previous piece as non overflowing, and then go back one piece and continue working.

I intentionally scan more pieces than the usual 16 limit we use for put, because I want to reduce overflows as much as possible (to improve lookup times). To reduce the search costs, we only search within the current chain, and I know that the worst case scenario for that is 29 in truly random cases.

This should do amortize the cost of fixing the overflows on deletes to a high degree, I hope.

Next, we need to figure out what to do about compaction. Given that we are already doing some deletion book keeping when we clear a piece, I’m going to also do compaction only when a piece is emptied. For that matter, I think it make sense to only do a page level compaction attempt when the piece we just cleared is still empty after an overflow merge attempt. Here is the logic:

Page compaction is done by finding a page’s sibling and seeing if we can merge them together. A sibling page is the page that share the same key prefix with the current page except a single bit. We need to check that we can actually do the compaction, which means that there is enough leaf pages, that the sizes of the two pages are small enough, etc. There are a lot of scenarios we are handling in this code. We verify that even if we have enough space theoretically, the keys distribution may cause us to avoid doing this merge.

Finally, we need to handle the most complex parts. We re-assign the buckets in the hash, then we see if we can reduce the number of buckets and eventually the amount of memory that the directory takes. The code isn’t trivial, but it isn’t really complex, just doing a lot of things:

With this, I think that I tackled the most complex pieces of this data structure. I wrote the code in C because it is fun to get out and do things in another environment. I’m pretty sure that there are bugs galore in the implementation, but that is a good enough proof of concept to do everything that I wanted it to do.

However, writing this in C, there is one thing that I didn’t handle, actually destroying the hash table. As it turns out, this is actually tricky, I’ll handle that in my next post.

time to read 3 min | 579 words

Building data structures is fun, until you need to actually implement all the non core stuff. In the previous post, we covered iteration, but now we have to deal with the most annoying of features, deletions. In some data structures, implementing deletions can take significantly more time and effort than all other work combined. Let’s see what it takes to handle deletions in the hash table as it stands.

I started things out by just scanning for the right value and removing it verbatim. Here is what this looked like:

This works. The value is removed, future modifications or queries can run and everything Just Works. Even overflow operations will just work, including if we deleted all the data from a piece, it will still be marked as overflow and queries / modifications will proceed to get the right value from the right place.

In particular, we are missing handling for overflows and compaction. Overflows inside a page happens when we have can’t fit a key value pair in its natural piece (a 64 bytes boundary inside the page), so we place it on a nearby piece. Compaction happens when we removed enough data that we can merge sibling pages and free a page from the system.

Let’s handle the overflow case first, because it is easier. One option we have for handling overflows is to check if there is any overflow for a page, and after freeing some memory, check the next pieces for keys that we can move to our piece. That is actually quite complex, because there are two types of such keys. The first type refers to keys that belong directly to the piece we removed from, but the second type of keys that we have in play here are keys that overflow past this piece.

In other words, let’s say that we deleted a value from piece #17. We need to check pieces 18 – 33 for keys that belong on piece #17. That is the first type. The second type is to check the next pieces for keys whose native location is earlier than piece #17. The idea is that we’ll place that data nearer its ideal location.

The problem here is that we now have to do a lot of work on deletion, and that isn’t something that I’m a fan of. One of the common use cases for deletes is massive deletes, so we’ll spend time re-arranging the keys, only to have them deleted immediately afterward. Instead, I think that I’ll take advantage on the organization of pieces in the hash table. Instead of handling overflows whenever a delete is issued, we’ll handle them only when a piece is emptied. That also means that we can be sure that we’ll have space for the keys we want to move.

Here is what I came up with at 2:30 AM:

I’m not happy about this, though. It does the job, but you’ll note one thing it does not do. It doesn’t clear the overflow flag. This is because overflow is a transitive property. That is, I may have moved all the keys that belong to a piece to that piece, and no other piece have keys that belong to it. But keys that belong to previous pieces may be located on pieces after it. If we want to clear the overflow flag, we need to be ready to do a whole lot more.

But that is something that I’ll do at a more reasonable hour.

time to read 3 min | 563 words

I run perf tests and memory utilization tests on my implementation and finally got it to the right place. But the API I have is pretty poor. I can put a key and value, or get the value by key. But we probably want a few more features.

I changed the put implementation to be:

This allows me to do an atomic replace and get the old value from the table. That is a nice low hanging fruit. But the key feature that I want to talk about today is iteration, as you might have figured out from the post title Smile.

I’m writing this code in C, because I find it interesting to practice in different environments, and C doesn’t really have an iteration API. So here is what I came up with:

If this was a public API I was building, I would probably want to hide the implementation details of the hash_iteration_state. Right now, I get a allocation and failure free API, because the caller is responsible for supplying the space for the state.

Here is how we can iterate using this API:

Not too bad, right? And this is basically what you’ll get when you use yield and such in languages that support native iterations. In C, you need to manage this yourself, but I don’t think that I got too lost here.

We store the current state in the state variable, and simply traverse the data in the buckets / pieces as they come.

Looking at this code, what is missing? Error handling…

But wait, I can hear you say, how can there be errors here? There are no moving pieces that can break, surely.

Well, the caller of our API may provide some moving pieces for us. For example, consider this code:

In other words, if we iterate and modify the data, what is going to happen? Well, we may change the position of values, which will lead us to skipping some values, iterating over some values twice, etc. What is worse, this may violate invariants in the code. In particular, the invariant in question is that current_piece_byte_pos always points to the start of a new key. If the data moved because of the put, this doesn’t hold true any longer.

I added protection to that by adding a version field to the directory, which is incremented whenever we call a put / replace on the directory. Then we can check if the value has changed. The issue is how do we report this in? Right now, I wrote:

image

I guess I could have done better by changing the return value to an int and returning better error code directly. This is a perfect case for exception, I think, since this is an edge case that should never be hit in real code. The fact that modifying the hash table will invalidate the iterator and cause it to stop working, on the other hand, might not be immediately obvious to the caller. More likely than not, though, anyone trying to write mutating code such as the one above will quickly figure out that this isn’t working and check exactly why.

Because of that, I decided to keep the bool return value, to simplify the life of our callers.

The full code is here.

time to read 7 min | 1260 words

In the previous post, I wrote about how I changed the structure of the hash leaf page to increase data density. I managed to get it down to 32MB range when I’m using random keys. That is a pretty great number, for memory usage, but what is the cost in terms of performance?

Well, let’s figure it out, shall we?

I added some tracing code and got the first result:

3.124000 us/op with 32.007813 MB

That is not to shabby, right? Let’s see where we are spending most of our time, shall we? I opened the profiler and got:

image

Okay, that is a good point, isn’t it? Changing to release mode gives us:

1.471000 us/op with 32.007813 MB

that is much nicer, but still, profiler please…

As a side note, it actually takes less time to run the profiler than for it to analyze its output. I was looking at this for a while.

image

The result was… stunning:

image

What is this thing? And why did it take almost 50% of my runtime?

As it turns out, I was compiling for x86, and I’m using a lot of shifts on 64 bits numbers. This _allshl seems to be part of the x86 runtime. That means that what I expected to be a cheap instruction on a register was actually a method call.

That is interesting, but easy to fix. When running in Release/x64, we get the following results:

0.723 us/op with 32.007813 MB

Okay, so we are under a microsecond per op, and very reasonable memory, good to go, right?

Well, remember that I did absolutely zero optimizations so far? What does the profiler tell us now? Here is an interesting hotspot:

image

That is reasonable, we are benching this method, after all. But inside that method, we see:

image

This is the part where we scan an existing piece to see if the value is inside it or not. This tell us if we need to add a new value or update an existing one. It make sense this will be hot, we have to do it on each put to the data related to the piece where we want to put the new key.

There are a few ways to deal with this, we can try to move from the simple varint mode to a more complex (and performant) system. StreamVByte would probably be a good solution, in term of raw performance. But it is meant for 32 bits numbers and doesn’t play nice with being able to remove and add values from the stream easily.

I could also try to play games, instead of calling this function twice, call it once and pass both k and v. However, that is almost assuredly a false play. The varint method is small enough that it doesn’t really matter, the compiler can inline it and play its own optimizations. Also, I tried it and there was no noticeable performance change, so that’s down.

Another way to deal with it is to reduce the number of times we call this function. And here is where things get interesting. Why is this called so much? Because during the put process, we find a page to put a value, then in that page, we find a piece (a 64 byte range) that we will put the key and value in. When we get to the piece, we need to check the already existing data if the key is there or not. So far, so good, but there is another factor to consider, overflows.

A piece may overflow and spill into consecutive pieces. After all, that is what allowed us to reduce the memory usage from 147MB to just 32MB in the random integers scenario. However, that also means that we may need to scan much larger piece of the page. That explains why we are seeing so much usage of the decoding function.

Let’s look at the previous behavior, where we have no overflow at all?

0.551000 us/op with 147.320313 MB

That is a much cheaper cost, but much higher memory. It looks like the typical compute vs. memory cycle, but let’s look at the actual costs?

image

You’ll notice that we spend most of our time on increasing the hash table size, allocating and moving memory, etc. So even though we are faster, that isn’t a good option for us.

One thing to note, we are looking for the same key, and decoding all the data to find it. But we don’t actually need to do that, we already have the key, and encoded it to its varint form. We can do a search on the raw encoded data to find it. It won’t be good enough for the positive case (we may have a value that was encoded to the same form), but it should help for the common case of inserting a new value. If we find something with memmem(), we still need to decode the data itself and see if the pattern we found is a key or a value, but that should help.

I tested it using GCC’s implementation, and the performance dropped by almost 50%, it took 1.3 us/op! Maybe if I was using a SIMD optimized implementation, that would help, but given the kind of data we are looking for, it didn’t pan out.

Another option is to reduce the number of times we’ll try to overflow a value. Right now, if we can’t put a value in its proper place, we’ll try putting it in any of the other locations. That means that we may probe as many as 127 pieces. It also means that during put, we have to scan overflow chains. As we saw in the previous post, that can add up to scanning up to 1.8 KB of data for a single put. What happens if we limit the overflow amount?

Let’s see if we limit the overflow to 32 probes. Now it only takes 0.403 us/op, which is a huge improvement. But what about the memory size? It’s easier to look things up as a table:

Max chain Overall Time (sec) us/op Size (MB)
10.5450000.545000147.320313
20.3590000.35900075.156250
40.3720000.37200055.523438
80.3220000.32200036.882813
160.3360000.33600032.226563
320.4480000.44800032.007813
640.5960000.59600032.007813
1280.7700000.77000032.007813

These numbers are interesting, but let’s look at them as a graph, shall we?

image

We can see that the size drops sharply as the performance is best between 8 and 16 probe attempts, and all we are left choosing is the memory cost.

If we go with 8 probe attempts, we’ll pay with additional 4.875 MB, but with 16 probe attempts, we’ll use just 224KB more with a cost of 0.044 us/op more than the optimal value.

We could go to 32, of course, which gives us optimal size, with about 60% of the cost of doing the full scan. However, by paying just 224KB more, we get down to 43% of the initial cost. And that certainly seems like it is worth it.

You can find the full source code (a little bit cleaned up) here.

time to read 5 min | 804 words

I wrote before about an idea for designing the leaf page of an extendible hash page. The idea was to arrange things on a cache line size, which would reduce how much we’ll need to search inside the page. On the other hand, it means that we might need to split whenever a particular 64 bits range is full.

I went ahead an implemented that, just to see how it would work in practice. You can read the code here. Please note, this is testing code, so the point is to see how things work, not so much to do things properly. One really useful piece you’ll find there is that I’m outputting the structure of the hash to a graph. Being able to visually inspect the state of the system at various points has been invaluable to me.

I have tested the implementation with two types on inputs. First, we have sequential keys and values. That is what I expect will be the most common use case that I have for this. It worked, but storing one million items results in just over 8 MB of memory being used. Given that my key & value are 8 bytes each, that isn’t too bad. In fact, that is pretty great. If we were storing this as an array, we would need about 15.2MB, so there are spacing savings there as well.

Looking at the internal structure of the hash table, we see that all 64 bytes pieces hold some values (all between 42 – 48 bytes). That is pretty great all around, open the champagne.

I decided to see what kind of behavior we’ll see when we have a random distribution. The situation was… worse. In fact, storing one million random keys and values takes 147MB. Yes, that is for when you can store the whole data in just 15.2 MB in its raw form.

To be fair, one of the advantages that we have for sequential values is that we use varints to store the values, so they compress well. With random values, we don’t really benefit from that. I tested this again with different ranges of values. With random values up to 2^32, we get a hash table that over 44MB in size and when we limit the range to ten millions, we use “merely” 27MB.

The issue is fairly simple, the moment we have a 64 bytes range full, we are forced to do a page split. That lead us to high fragmentation, which is not desirable at all. With random distribution, it is likely that we’ll hit the 64 bytes capacity easily on each page, leading to very high fragmentation. Amusingly enough, the sequential mode will not have this, because this will spread the values neatly along the different pieces in a page.

The answer to that is to move to an open addressing mode inside the page. In other words, when we have a 64 bytes piece that is full, we’ll mark it as having overflowed and put the value on the subsequent piece. That would allow us to spill over the values across the entire page much more efficiently. It does complicates deletions, but we’ll manage.

Implementing that feature took a little bit, but here is the result. For the case of random keys and values, we go to a total memory usage of 32 MB, a very significant improvement. There has been no change for the sequential mode, which was already optimal for our scenario.

When values are in the 2^32 range, we use 16 MB and for ten million range, we get 8MB. These numbers are pretty awesome, even if I say so myself.

For comparison, I wrote this C# program, which merely store 1 million 16 bytes key pairs. That is basically what I’m doing in my code, and it was interesting to compare the memory consumption between the two. The C# version took 37.2 MB, compared to the bad result I had of 32MB in my code.

Looking at the actual overflow count, on the random data case, it look like we have the following stats:

  • Max overflow chain (overflow pieces that are next to one another): 29
  • Total number of chains: 107,830
  • Average chain length: 2.35

In other words, we have a 10% chance in this case of hitting an overflow chain (meaning that we need to check more than 64 bytes of data to see if the key exists). When this happens, we’ll usually scan 128 – 192 bytes, but may scan up to 1.8 KB of sequential memory. There are a total of 1,388 cases where we have more than 10 consecutive overflow pieces (1% chance of hitting that) and only 64 cases where the chain is greater than 20.

Overall, these numbers looks good to me.

As usual, comments are welcome. I would appreciate any feedback you may have.

time to read 4 min | 672 words

Trevor asked a really interesting question in the mailing list. Assume that we have the following model:

image

And what we want to do is to be able to add a note to a book. The API looks like so:

public void SubmitNote(string isbn, string note);

The requirements are simple:

  • There is one book per ISBN
  • Notes on the book shouldn’t be duplicated
  • The Book document may be large, and we don’t actually care about it, just want to add it
  • If there is a note on an non existent book, we need to create it

The key observation here is that Trevor doesn’t want to load the document, modify it and save it back. What he is looking for is a way to send the change to the database server and have it happen there.

Luckily, RavenDB has the right set of features for this, the Patching API. Here is how you’ll write the code to update the document without having to load it:

We can send the update to the server, have the change happen there, with no need to load and save the document. And we get strongly typed API and compiler validation, joy all around.

This is great, but it misses something. If we’ll run this code twice, we’ll have a duplicated comment, which is something that we don’t want to do.

Luckily, we have more options. The strongly typed API we just wrote is sitting on top of the actual patch API, which is much more powerful. Let’s see how we can tackle this requirement, shall we?

Now, we send a script to the server, which will execute it. If the note already exists, that means that we will not modify the document. So we got that out of the way. But we are still missing a piece of the puzzle, as you can probably see from the title of the post. What happens if the Book document does not exists?

Luckily, we thought about this issue and RavenDB is ready to help here as well. When you send a patch request to RavenDB, you can send a single script, or two. The second one will be executed if the document does not exists, like so:

If the document exists, we will run the patch script and modify the document. If the document does not exists, we will create an empty document and then run the patchIfMissing script to populate it. The code above handles all of the stated requirements, and we can call it a day.

But as long as we are here, let’s add another requirement. The only information we have from the SubmitNote(isbn, note) call is the ISBN of the book. Presumably we want to do things to this book, for example, figure out what the title is. Using this mechanism, how do we do this? When we return from the SaveChanges call, there is no way to tell if the document was newly created or already there?

The answer here is to ask RavenDB to do so. And we can modify our patchIfMissing a bit to do so. Note the changes in that are in the code:

If we need to execute the missing patch code, we’ll create the new document and in the same transaction, we’ll also create a task document to fetch additional information about this book. Your code will usually subscribe to the Tasks collection and execute additional work as a result of that.

And this is it, everything we need to, in a single operation. Note that in many cases, you don’t actually have a single such call, though. Let’s say that you are getting a batch of work all at once. Your API will actually look like:

public void SumbitNotes(Dictionary<string, string> isbnToNotes);

In this case, we are going to execute the code above for all the items that we got in the call, but call SaveChanges once. All of them would operate as a single transaction and a single round trip to the server.

time to read 5 min | 843 words

In the previous posts in this series, I explored a bit how to generate a full text index on top of the Enron data set. In particular, we looked at (rudimentary) analysis of text in the first post and looked into posting lists (list of matching documents for specific terms) in the second one. It occurred to me that we need to actually have a much better understanding of the kind of requirements that we have from posting lists in general, so let’s look at them, shall we?

  • Add to the list (increasing numbers only).
  • Iterate the list (all, or from starting point).
  • Reduce disk space and memory utilization as much as possible.

The fact that I want to be able to add to the list is interesting. The typical use case in full text search is to generate the full blown posting list from scratch every time. The typical model is to use LSM (Log Structure Merge) and take advantage on the fact that we are dealing with sorted list to merge them cheaply.

Iterating the list is something you’ll frequently do, to find all the matches or to merge two separate lists. Here is the kind of API that I initially had in mind:

As you can see, there isn’t much there, which is intentional. I initially thought about using this an the baseline of a couple of test implementations using StreamVByte, FastPFor as well as Gorrilla compression. The problem is that there is the need to balance compression ratio with the cost of actually going over the list. Given that my test cases showed a big benefit for using Roaring Bitmaps, I decided to look at them first and see what I can get out of it.

RoaringBitamps is a way to store (efficiently) a set of bits, they are very widely used in the industry. The default implementation is also entirely suitable for my purposes. Mostly because they make use of managed memory, and a hard requirement that I have placed on this series is that I want to be able to use persistent memory. In other words, I want to be able to write the data out, then be able to do everything on top of memory mapped data, without having to parse it.

Roaring Bitmaps works in the following manner. Each 64K range of integers is divided into each own 8KB segments. Given that I’m using Voron as a persistence library, these numbers don’t work for my needs. Voron uses an 8KB page size, so we’ll drop these numbers by half. Each range will be 32K of integers and take a maximum of 4KB of disk space. This allows me to store it much more efficiently inside of Voron. Each segment, in turn, has a type. The types can be either:

  • Array – if the number of set bits in the segment is less than 2048, the data will use a simple sorted array implementation, with each value taking 2 bytes.
  • Bitmap – if the number of set bits in the segment is between 2048 and 30,720, the segment will use a total of 4096 bytes and be a standard bitmap.
  • Reversed array – if the number of set bits in the segment is higher than 30,720, we’ll store in the segment the unset bits as a sorted array.

This gives us quite a few advantages:

  • It is straightforward to build this incrementally (remember that we only ever add items in the end).
  • It is quite efficient in terms of space saving in the case of sparse / busy usage.
  • It is cheap (computationally) to work with and process.
  • It is very simple to use from memory mapped file without having to parse / create managed objects.

The one thing that we still need to take into account is how to deal with the segment metadata. How do we know what segment belong to what range. In order to handle that, we’ll define the following:

The idea is that we need to store two important pieces of information. The start location (is always going to be a multiple of 32K) and the number of set bits (which has a maximum of 32K). Therefor, we can pack all of them into a single int64. The struct is merely there for convenience.

In other words, in addition to the segments with the actual set bits, we are also going to have an array of all the segment’s metadata. In practice, we’ll also need another value here, the actual location of the segment’s data, but that is merely another int64, so that is still very reasonable.

As this is currently a mere exercise, I’m going to skip actually building the implementation, but it seems like it should be a fairly straightforward approach. I might do another post about how to actually implement this feature on Voron, because it is interesting. But I think that this is already long enough.

We still have another aspect to consider. So far, we talked only about the posting lists, but we also need to discuss the terms. But that is a topic for the next post in the series.

time to read 5 min | 837 words

My previous post got an interesting comment:

I smell a Lucene.NET replacement for Raven 5 in the future :-)

I wanted to deal with this topic, but before we get to it, please note. I’m not committing to any features / plans for RavenDB in here, merely outlining my thinking.

I really want to drop Lucene. There are many reasons for this.

  • The current release version of Lucene.NET was released over 7 years ago(!) and it matches a Lucene (JVM) version that was out in 2010.
  • The active development of Lucene.NET is currently focused on released 4.8 (dated 2014) and has been stalled for several years.
  • Lucene’s approach to resource utilization greatly differs from how I would like to approach things in modern .NET Core applications.

There is a problem with replacing Lucene, however. Lucene is, quite simply, an amazing library. It is literally the benchmark that you would compare yourself against in its field. This isn’t the first time that I have looked into this field, mind. Leaving aside the fact that I believe that I reviewed most of the open source full text search libraries, I also wrote quite a few spikes around that.

Lucene is big, so replacing it isn’t something that you can do in an afternoon or a weekend. You can get to the 80% functionality in a couple of weeks, for sure, but it is the remaining 20% that would take 300% of the time Smile. For example, inverted indexes are simple, but then you need to be able to handle phrase queries, and that is a whole other ball game.

We looked into supporting the porting of Lucene.NET directly as well as developing our own option, and so far neither option has gotten enough viability to take action.

2015 – 2018 has been very busy years for us. We have rebuilt RavenDB from scratch, paying attention to many details that hurt us in the past. Some of the primary goals was to improve performance (10x) and to reduce support overhead. As a consequence of that, RavenDB now uses a lot less managed memory. Most of our memory utilization is handled by RavenDB directly, without relying on the CLR runtime or the GC.

Currently, our Lucene usage is responsible for the most allocations in RavenDB. I can’t think of the last time that we had high managed memory usage that wasn’t directly related to Lucene. And yes, that means that when you do a memory dump of a RavenDB’s  process, the top spot isn’t taken by System.String, which is quite exceptional, in my experience.

Make no mistake, we are still pretty happy with what Lucene can do, but we have a lot of experience with that particular version and are very familiar with how it works. Replacing it means a substantial amount of work and introduce risks that we’ll need to mitigate. Given these facts, we have the following alternatives:

  1. Write our own system to replace Lucene.
  2. Port Lucene 8 from JVM to .NET Core, adapting it to our architecture.
  3. Upgrade to Lucene.NET 4.8 when it is out.

The first option gives us the opportunity to build something that will fit our needs exactly. This is most important because we also have to deal with both feature fit and architectural fit. For example, we care a lot about memory usage and allocations. That means that we can build it from the ground up to reduce these costs. The risk is that we would be developing from scratch and it is hard to scope something this big.

The second option means that we would benefit from modern Lucene. But it means having to port non trivial amount of code and have to adapt both from JVM to CLR but also to our own architectural needs. The benefit here is that we can likely not port stuff that we don’t need, but there is a lot of risk involved in such a big undertaking. We will also effectively be either forking the .NET project or have just our own version. That means that the maintenance burden would be on us.

Upgrading to Lucene.NET 4.8 is the third option, but we don’t know when that would be out (stalled for a long time) and it does move us very far in the Lucene’s versions. It also likely going to require a major change in how we behave. Not so much with the coding wise (I don’t expect that to be too hard) but in terms of the different operational properties than what we are used to.

The later two options are going to keep Lucene as the primary sources of allocations in our system, while the first option means that we can probably reduce that significantly for both indexing and querying, but we’ll have to push our own path.

We have made no decisions yet, and we’ll probably won’t for a while. We are playing around with all options, as you can see on this blog, but that is also mostly because it is fun to explore and learn.

time to read 1 min | 112 words

Consider the following C code snippet:

image

This code cannot be written in C#. Why? Because you can’t use ‘+’ on bool, and you can’t cast bools. So I wrote this code, instead:

And then I changed it to be this code:

Can you tell why I did that? And what is the original code trying to do?

For that matter (and I’m honestly asking here), how would you write this code in C# to get the best performance?

Hint:

image

FUTURE POSTS

  1. Notes on RavenDB vs. PostgreSQL - one day from now

There are posts all the way to Dec 12, 2019

RECENT SERIES

  1. re (24):
    12 Nov 2019 - Document-Level Optimistic Concurrency in MongoDB
  2. Voron’s Roaring Set (2):
    11 Nov 2019 - Part II–Implementation
  3. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  4. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  5. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats