Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,903 | Comments: 49,346

filter by tags archive
time to read 5 min | 843 words

In the previous posts in this series, I explored a bit how to generate a full text index on top of the Enron data set. In particular, we looked at (rudimentary) analysis of text in the first post and looked into posting lists (list of matching documents for specific terms) in the second one. It occurred to me that we need to actually have a much better understanding of the kind of requirements that we have from posting lists in general, so let’s look at them, shall we?

  • Add to the list (increasing numbers only).
  • Iterate the list (all, or from starting point).
  • Reduce disk space and memory utilization as much as possible.

The fact that I want to be able to add to the list is interesting. The typical use case in full text search is to generate the full blown posting list from scratch every time. The typical model is to use LSM (Log Structure Merge) and take advantage on the fact that we are dealing with sorted list to merge them cheaply.

Iterating the list is something you’ll frequently do, to find all the matches or to merge two separate lists. Here is the kind of API that I initially had in mind:

As you can see, there isn’t much there, which is intentional. I initially thought about using this an the baseline of a couple of test implementations using StreamVByte, FastPFor as well as Gorrilla compression. The problem is that there is the need to balance compression ratio with the cost of actually going over the list. Given that my test cases showed a big benefit for using Roaring Bitmaps, I decided to look at them first and see what I can get out of it.

RoaringBitamps is a way to store (efficiently) a set of bits, they are very widely used in the industry. The default implementation is also entirely suitable for my purposes. Mostly because they make use of managed memory, and a hard requirement that I have placed on this series is that I want to be able to use persistent memory. In other words, I want to be able to write the data out, then be able to do everything on top of memory mapped data, without having to parse it.

Roaring Bitmaps works in the following manner. Each 64K range of integers is divided into each own 8KB segments. Given that I’m using Voron as a persistence library, these numbers don’t work for my needs. Voron uses an 8KB page size, so we’ll drop these numbers by half. Each range will be 32K of integers and take a maximum of 4KB of disk space. This allows me to store it much more efficiently inside of Voron. Each segment, in turn, has a type. The types can be either:

  • Array – if the number of set bits in the segment is less than 2048, the data will use a simple sorted array implementation, with each value taking 2 bytes.
  • Bitmap – if the number of set bits in the segment is between 2048 and 30,720, the segment will use a total of 4096 bytes and be a standard bitmap.
  • Reversed array – if the number of set bits in the segment is higher than 30,720, we’ll store in the segment the unset bits as a sorted array.

This gives us quite a few advantages:

  • It is straightforward to build this incrementally (remember that we only ever add items in the end).
  • It is quite efficient in terms of space saving in the case of sparse / busy usage.
  • It is cheap (computationally) to work with and process.
  • It is very simple to use from memory mapped file without having to parse / create managed objects.

The one thing that we still need to take into account is how to deal with the segment metadata. How do we know what segment belong to what range. In order to handle that, we’ll define the following:

The idea is that we need to store two important pieces of information. The start location (is always going to be a multiple of 32K) and the number of set bits (which has a maximum of 32K). Therefor, we can pack all of them into a single int64. The struct is merely there for convenience.

In other words, in addition to the segments with the actual set bits, we are also going to have an array of all the segment’s metadata. In practice, we’ll also need another value here, the actual location of the segment’s data, but that is merely another int64, so that is still very reasonable.

As this is currently a mere exercise, I’m going to skip actually building the implementation, but it seems like it should be a fairly straightforward approach. I might do another post about how to actually implement this feature on Voron, because it is interesting. But I think that this is already long enough.

We still have another aspect to consider. So far, we talked only about the posting lists, but we also need to discuss the terms. But that is a topic for the next post in the series.

time to read 5 min | 837 words

My previous post got an interesting comment:

I smell a Lucene.NET replacement for Raven 5 in the future :-)

I wanted to deal with this topic, but before we get to it, please note. I’m not committing to any features / plans for RavenDB in here, merely outlining my thinking.

I really want to drop Lucene. There are many reasons for this.

  • The current release version of Lucene.NET was released over 7 years ago(!) and it matches a Lucene (JVM) version that was out in 2010.
  • The active development of Lucene.NET is currently focused on released 4.8 (dated 2014) and has been stalled for several years.
  • Lucene’s approach to resource utilization greatly differs from how I would like to approach things in modern .NET Core applications.

There is a problem with replacing Lucene, however. Lucene is, quite simply, an amazing library. It is literally the benchmark that you would compare yourself against in its field. This isn’t the first time that I have looked into this field, mind. Leaving aside the fact that I believe that I reviewed most of the open source full text search libraries, I also wrote quite a few spikes around that.

Lucene is big, so replacing it isn’t something that you can do in an afternoon or a weekend. You can get to the 80% functionality in a couple of weeks, for sure, but it is the remaining 20% that would take 300% of the time Smile. For example, inverted indexes are simple, but then you need to be able to handle phrase queries, and that is a whole other ball game.

We looked into supporting the porting of Lucene.NET directly as well as developing our own option, and so far neither option has gotten enough viability to take action.

2015 – 2018 has been very busy years for us. We have rebuilt RavenDB from scratch, paying attention to many details that hurt us in the past. Some of the primary goals was to improve performance (10x) and to reduce support overhead. As a consequence of that, RavenDB now uses a lot less managed memory. Most of our memory utilization is handled by RavenDB directly, without relying on the CLR runtime or the GC.

Currently, our Lucene usage is responsible for the most allocations in RavenDB. I can’t think of the last time that we had high managed memory usage that wasn’t directly related to Lucene. And yes, that means that when you do a memory dump of a RavenDB’s  process, the top spot isn’t taken by System.String, which is quite exceptional, in my experience.

Make no mistake, we are still pretty happy with what Lucene can do, but we have a lot of experience with that particular version and are very familiar with how it works. Replacing it means a substantial amount of work and introduce risks that we’ll need to mitigate. Given these facts, we have the following alternatives:

  1. Write our own system to replace Lucene.
  2. Port Lucene 8 from JVM to .NET Core, adapting it to our architecture.
  3. Upgrade to Lucene.NET 4.8 when it is out.

The first option gives us the opportunity to build something that will fit our needs exactly. This is most important because we also have to deal with both feature fit and architectural fit. For example, we care a lot about memory usage and allocations. That means that we can build it from the ground up to reduce these costs. The risk is that we would be developing from scratch and it is hard to scope something this big.

The second option means that we would benefit from modern Lucene. But it means having to port non trivial amount of code and have to adapt both from JVM to CLR but also to our own architectural needs. The benefit here is that we can likely not port stuff that we don’t need, but there is a lot of risk involved in such a big undertaking. We will also effectively be either forking the .NET project or have just our own version. That means that the maintenance burden would be on us.

Upgrading to Lucene.NET 4.8 is the third option, but we don’t know when that would be out (stalled for a long time) and it does move us very far in the Lucene’s versions. It also likely going to require a major change in how we behave. Not so much with the coding wise (I don’t expect that to be too hard) but in terms of the different operational properties than what we are used to.

The later two options are going to keep Lucene as the primary sources of allocations in our system, while the first option means that we can probably reduce that significantly for both indexing and querying, but we’ll have to push our own path.

We have made no decisions yet, and we’ll probably won’t for a while. We are playing around with all options, as you can see on this blog, but that is also mostly because it is fun to explore and learn.

time to read 1 min | 112 words

Consider the following C code snippet:

image

This code cannot be written in C#. Why? Because you can’t use ‘+’ on bool, and you can’t cast bools. So I wrote this code, instead:

And then I changed it to be this code:

Can you tell why I did that? And what is the original code trying to do?

For that matter (and I’m honestly asking here), how would you write this code in C# to get the best performance?

Hint:

image

time to read 1 min | 107 words

Webinar banner

Tomorrow I’m going to be giving a webinar about RavenDB Cloud, among the topics I’m going to cover are:

  • The type of work you can hand over to us while you put more time into your application
  • The different types of instances you can use and the resources you can provision
  • Setting up a distributed database instance and securing it in minutes
  • Provisioning a free instance to try it out
  • Ways to save money on the cloud
  • Getting BANG for your BUCK! How RavenDB performs fast on less expensive machines
  • Monitoring costs, performance, events

You can register to the webinar using the following link.

time to read 7 min | 1333 words

The first part is here. As a reminder, Sled is an embedded database engine written in Rust. It takes a very different approach for how to store data, which I’m really excited to see. And with that, let’s be about it. In stopped in my last post when getting to the flusher, which simply sleep and call flush on the iobufs.

The next file is iobuf.rs.

Note that there are actually two separate things here, we have the IoBuf struct:

image

And we have IoBufs struct, which contains the single buffer.

image

Personally, I would use a different name, because is is going to be very easy to confuse them. At any rate, this looks like this is an important piece, so let’s see what is going on in there.

You can see that the last three fields shown compose of what looks like a ring buffer implementation. Next, we have this guys:

image

Both of them looks interesting / scary. I can’t tell yet what they are doing, but the “interesting thread interleaving” stuff is indication that this is likely to be complex.

The fun starts at the start() function:

image

This looks like it is used in the recovery process, so whenever you are restarting the instance.  The code inside is dense, take a look at this:

image

This test to see if the snapshot provided is big enough to be considered stand alone (I think). You can see that we set the next_lsn (logical sequence number) and next_lid (last id? logical id?, not sure here). So far, so good. But all of that is a single expression, and it goes for 25 lines.

The end of the else clause also return a value, which is computed from the result of reading the file. It works, it is elegant but it takes a bit of time to decompose.

There is something called SegmentAccountant that I’m ignoring for now because we finally got to the good parts, where actual file I/O is being performed. Focusing on the new system startup, which is often much simpler, we have:

image

You can see that we make some writes (not sure what is being written yet) and then sync it. the maybe_fail! stuff is manual injection of faults, which is really nice to see. The else portion is also interesting.

image

This is when we can still write to the existing snapshot, I guess. And this limits the current buffer to the limits of the buffer size. From context, lid looks like the global value across files, but I’m not sure yet.

I run into this function next:

image

On its own, it isn’t really that interesting. I guess that I don’t like the unwrap call, but I get why it is there. It would significantly complicate the code if you had to handle failures from both the passed function and with_sa unable to take the lock (and that should only be the case if a thread panicked while holding the mutex, which presumably kill the whole process).

Sled calls itself alpha software, but given the number of scaffolding that I have seen so far (event log, debug_delay and not the measurements for lock durations) it has a lot of work done around actually making sure that you have the required facilities to support it. Looking at the code history, it is a two years old project, so there has been time for it to grow.

Next we have this method:

image

What this method actually going on in here is that based on the size of the in_buf, it will write it to a separate blob (using the lsn as the file name). If the value is stored externally, then it creates a message header that points to that location. Otherwise, it creates a header that encodes the length of the in_buf and returns adds that to the out_buf. I don’t like the fact that this method get passed the two bool variables. The threshold decision should probably be done inside the method and not outside. The term encapsulate is also not the best choice here. write_msg_to_output would probably more accurate.

image

The next interesting bit is write_to_log(), which goes on for about 200 lines. This takes the specified buffer, pad it if needed and write it to the file. It looks like it all goes to the same file, so I wonder how it deals with space recovery, but I’ll find it later. I’m also wondering why this is called a log, because it doesn’t look like a WAL system to me at this point. It does call fsync on the file after the write, though. There is also some segment behavior here, but I’m used to seeing that on different files, not all in the same one.

It seems that this method can be called concurrently on different buffers. There is some explicit handling for that, but I have to wonder about the overall efficiency of doing that. Random I/O and especially the issuance of parallel fsyncs, are likely not going to lead to the best system perf.

image

Is where the concurrent and possibly out of order write notifications are going. This remembers all the previous notifications and if there is a gap, will wait until the gap has been filled before it will notify interested parties that the data has been persisted to disk.

Because I’m reading the code in a lexical order, rather than in a more reasonable layered or functional approach, this gives me a pretty skewed understanding of the code until I actually finish it all. I actually do that on purpose, because it forces me to consider a lot more possibilities along the way. There are a lot of questions that might be answered by looking at how the code is actually being used in parts that I haven’t read yet.

At any rate, one thing to note here is that I think that there is a distinct possibility that a crash after segment N +1 was written (and synced) but before segment N was done writing might undo the successful write for N+1. I’m not sure yet how this is used, so that is just something to keep in mind for now.

I managed to find this guy:

image

If you’ll recall, this is called on a timer to write the current in memory state.

The only other interesting thing in this file is:

image

This looks like it is racing with other calls on trying to write the current buffer. I’m not so sure what sealed means. I think that this is to prevent other threads from writing to the same buffer while you might be trying to write it out.

That is enough for now, it has gotten late again, so I’ll continue this in another post.

time to read 11 min | 2171 words

imageThe Sled project is an embedded database written in Rust. I run into it a few times recently and given my day job, I decided to take a peek and understand how it works. The project talks about being Log Structure Merge (and also exposing this to the client) with B+Tree read performance. The last time I read an LSM codebase was quite some time ago, so this is going to be quite interesting, I hope.

As usual, I’m doing a stream of consciousness as I’m going through the code. I’m looking at commit 6ce3e1264d3bb46de768e8fe338bade02d0b30dc.

The first thing that I found is the actual source code for this:

image

The fact that there is a page cache component already tell me quite a lot about the project. That is uses pages (LevelDB does not, for example) and that it is cached (so likely not using mmap). That will likely be important later, but when starting out I enjoy jumping to conclusions and then seeing if I was right or not.

The following is taken from the readme for the pagecache:

image

This is quite interesting. Mostly because this shows a very different approach to building a page cache. It isn’t your usual B-Tree page, it seems. The docs reference LLAMA a lot, but I would rather go through the code first and read scholarly articles later.

As usual, I’m going over the code in lexical order, which means that the first real code file that I run into is a Rust implementation of doubly linked list. Nothing really interesting there, except massive amounts of unsafe code, so I’m going to skip to it’s usage, which is to implement a least recently used cache.

image

There are N shards, determined by the configuration of the system. This is controlled by cache_bits, where 2^CacheBits is the actual size. I’m not sure why the code uses this methods instead of just a number.

The codebase contains the values 0, 2 and 6 for cache_bits, giving us 1 shard, 4 shards and 64 shards, respectively. I assume that they use this to reduce contention on the same Shard instance for concurrent operations.

The LRU’s main method is accessed, which simply forward the call to one of the shards, so I’m going to skip that and see how that actually works when I’ll look at the shard code. Here is what the Shard looks like:

image image

Note that capacity is actually the max capacity, not reserved space in the entries vector. Each entry contains the actual size that it is using, as well and the Node itself (that reside in the LRU). The actual value that is kept is a PageId, which is the native integer type for the platform (32 bits – 4 bytes / 64 bits – 8 bytes). I’m not sure yet if this is a persisted value, but for something that is meant to be used in a persisted store, I find that choice very strange. It means that on 32 bits machine, you can only use 4 billion pages, while 64 bits there is much higher limit. That means that a database created on 64 bits might not be valid for 32 bits.

When building Voron, we were very careful that data will be the same in 32 bits and 64 bits, Windows / MacOS / Linux, etc.

When you call accessed on the Shard, it will add it to the list, and if the size of all entries is higher than the capacity, will remove entries until the size in the cache is smaller than the maximum capacity.

Interestingly enough, Sled is using an Epoch based GC system in Rust, implemented by Crossbeam. I wrote about Epoch systems when I reviewed FASTER, and I understand that Epoch GC is pretty common in Rust concurrent programming, because otherwise you run the risk of memory corruption.

The ds (I assume data structure) folder of the page cache also provide a concurrent Stack implementation, which it pretty elegant to read, but you need to be well versed in the Epoch handling to understand how it all works correctly under the covers.

The interesting stuff in the ds folder is the pagetable, which start with the following structs:

image

Each of those is initialized to array of pointers that is 2 MB in size (262,144 pointers, basically). That is a pretty big structure, and the PageTable struct ties it all together:

image

The top of the file mention that this assume that the caller is going to have a dense key space, and this make sense. Theoretically, this will use a maximum of 512GB if you store a sparse set of data in the page table.

All the operations in the page table are some variant of the traverse operation, so let’s focus on that.

image

First, the key is split to the first 18 bits (0 .. 256K) and the rest of the value. The first part is used to find the level 1 value.

The debug_delay() calls are debug only calls that add random waits, which help catch race conditions. Given that the code uses explicit Atomic instructions, I’m not surprised to see that there is really no help from the Rust safety behavior to ensure that there aren’t any logical race conditions.

This is the first time I’m reading real Rust code that explicitly deal with concurrency (without using the hammer that is Mutex, at least), and it is… reassuring seeing the same kind of behaviors that I’m injecting into my code to validate things.

This next bit handles adding a new value (safely) at the next level. Given the dense assumption, we can safely assume that this is a rare operation

image

Look at the comment when we fail to win the compare_and_set call. Who is actually going to drop the unused created value? That is on the Guard and Epoch, I think. We associated the value with them when we called into_shared, and since we didn’t take it out from the guard, it will be removed when the Epoch is over.

The final act of traverse() is to just return the value from the second level, like so:

image

Note that this may be an empty value, so let’s see how this actually get used. I’m using the simplest caller, which is get().

image

You can see that we have to do explicit null checking, translating that to an Option call to make calling code idiomatic.  The rest of the code there is just memory management, so I’m going to skip that and look at the rest of the pagecache code, now that I understand the underlying data structures.

Note, the code uses the Lsn as a logical type name, this maps to a 64 bits signed integer.

The first file in the pagecache crate is the blob_io one, which handles reading blob. Here is the (crate public only) functions that are implemented there.

image

Remember, LSN is the (Logical Sequence Number) and is a signed 64 bits integer. Config I’m not sure about at this point. The read_blob and write_blob are fairly simple. They just read / write (with CRC32 checks) data from disk. This is done on a full file each time. So that is quite interesting. The Config is responsible for mapping between an LSN and a file path.

The call to remove_blob will simply delete the mapped file from the disk, but it is the call to gc_blobs that is really interesting. What this does is to basically delete all the files whose name is greater than the stable_lsn provided. This is interesting, because it tells us a lot about the actual on disk persistent format.

Based on just the blob_io.rs file, I can tell that Sled is writing fairly large files to disk and reading them to memory. It is probably going to call gc_blobs() as part of its recovery and I’m willing to assume at this point that each blob is actually a segment. As I said, I like assuming things upfront when doing this kind of code reviews. I’m going to continue my reading and see if I was right.

The config file starts with usage instructions, which looks like:

image

So that is really interesting. First, we can see that Sled is probably only calling fsync once in a while, depending on its configuration. I looked into a real world project, and it uses the default value, which is flush every 500 ms and snapshot after 1 million. The reason this is interesting is that my experience tells me is that whenever I see a configuration option like that, it means that fsync is going to be pretty expensive and the database engine has chosen to allow users to trade performance for safety.

Voron, in contrast, actually don’t have any such feature. We don’t allow users to make that choice, instead, we accepted the fsync costs and dealt with them (using async commit and transaction merging, for example).

Anyway, we were talking about the config and how it works, here is how the Config gives us the path for a blob:

image

So this is pretty much just as expected. The data for each blob is a separate file under the blobs directory. Now we are only left with figuring out what that blob was.

The next file of interest is the DiskPtr file, which contains this struct:

image

And the only interesting method is the read():

image

So now we have more interesting data. There is stuff that is kept inline (presumably small items) and larger stuff that are read from the blobs directory. We already saw what the read_blob behavior is, but we haven’t checked what the meaning of read_message is. This is implemented later in the code, so I’m going to ignore it and just look at its return value:

image

So that allow us to make sense of what is going above. Basically, this is “gimme some bytes” call, regardless of where this is actually stored (which I don’t know yet).

The next file is event_log, and it is a debug tool to help verify the behavior of the system. I wholeheartedly approve and it is a really good sign for the maturity of the codebase.

The next is flusher, which is a dedicate thread that call this continuously:

image

There is nothing of interest in there, but it does indicate no coordination between flushing and other activities. Not sure whatever this is a good or bad idea.

Now, let’s see what this iobuf is all about, shall we?

The first thing I run into in this file is this:

image

And I can’t really figure this one out. I mean, why? Just setting InnerLsn = i64 should be good enough for all platforms, I think. I guess there might be some optimizations stuff that I’m missing, but that seems unlikely. In 64 bits, I would expect isize and i64 to have the exact same behavior.

This is now getting pretty late, so I’m going to close this for today. Tomorrow, I’m going to try to figure out how the rest of it works.

time to read 6 min | 1018 words

I got a great comment on my previous post about using Map/Reduce indexes in RavenDB for event sourcing. The question was how to handle time sensitive events or ordered events in this manner. The simple answer is that you can’t, RavenDB intentionally don’t expose anything about the ordering of the documents to the index. In fact, given the distributed nature of RavenDB, even the notion of ordering documents by time become really hard.

But before we close the question as “cannot do that by design", let’s see why we want to do something like that. Sometimes, this really is just the developer wanting to do things in the way they are used to and there is no need for actually enforcing the ordering of documents. But in other cases, you want to do this because there is a business meaning behind these events. In those cases, however, you need to handle several things that are a lot more complex than they appear. Because you may be informed of an event long after that actually happened, and you need to handle that.

Our example for this post is going to be mortgage payments. This is a good example of a system where time matters. If you don’t pay your payments on time, that matters. So let’s see how we can model this as an event based system, shall we?

A mortgage goes through several stages, but the only two that are of interest for us right now are:

  • Approval – when the terms of the loan are set (how much money, what is the collateral, the APR, etc).
  • Withdrawal – when money is actually withdrawn, which may happen in installments.

Depending on the terms of the mortgage, we need to compute how much money should be paid on a monthly basis. This depend on a lot of factors, for example, if the principle is tied to some base line, changes to the base line will change the amount of the principle. If only some of the amount was withdrawn, if there are late fees, balloon payment, etc. Because of that, on a monthly basis, we are going to run a computation for the expected amount due for the next month.

And, obviously, we have the actual payments that are being made.

Here is what the (highly simplified) structure looks like:

image

This includes all the details about the mortgage, how much was approved, the APR, etc.

The following is what the expected amount to be paid looks like:

image

And here we have the actual payment:

image

All pretty much bare bones, but sufficient to explain what is going on here.

With that in place, let’s see how we can actually make use of it, shall we?

Here are the expected payments:

image

Here are the mortgage payments:

image

The first thing we want to do is to aggregate the relevant operations on a monthly basis, since this is how mortgages usually work. I’m going to use a map reduce index to do so, and as usual in this series of post, we’ll use JavaScript indexes to do the deed.

Unlike previous examples, now we have real business logic in the index. Most specifically, funds allocations for partial payments. If the amount of money paid is less than the expected amount, we first apply it to the interest, and only then to the principle.

Here are the results of this index:

image

You can clearly see that mistake that were made in the payments. On March, the amount due for the loan increased (took another installment from the mortgage) but the payments were made on the old amount.

We aren’t done yet, though. So far we have the status of the mortgage on a monthly basis, but we want to have a global view of the mortgage. In order to do that, we need to take a few steps. First, we need to define an Output Collection for the index, that will allow us to further process the results on this index.

In order to compute the current status of the mortgage, we aggregate both the mortgage status over time and the amount paid by the bank for the mortgage, so we have the following index:

Which gives us the following output:

image

As you can see, we have a PastDue marker on the loan. At this point, we can make another payment on the mortgage, to close the missing amount, like so:

image

This will update the monthly mortgage status and then the overall status. Of course, in a real system (I mentioned this is highly simplified, right?) we’ll need to take into account payments made in one time but applied to different times (which we can handle by an AppliedTo property) and a lot of the actual core logic isn’t in indexes. Please don’t do mortgage logic in RavenDB indexes, that stuff deserve its own handling, in your own code. And most certainly don’t do that in JavaScript. The idea behind this post is to explore how we can handle non trivial event projection using RavenDB. The example was chosen because I assume most people will be familiar with it and it wasn’t immediately obvious how to go about actually solving it.

If you want to play with this, you can import the following file (Settings > Import Data) to get the documents and index definitions.

time to read 7 min | 1321 words

imageAlmost by accident, it turned out that I implemented a pretty simple, but non trivial task in both C and Rust and blogged about them.

Now that I’m done with both of them, I thought it would be interesting to talk about the differences in the experiences.

The Rust version clocks at exactly 400 lines of code and uses 12 external crates.

The C version has 911 lines of C code and another 140 lines in headers and depends on libuv and openssl.

Both took about two weeks of evenings of me playing around. If I was working full time on that, I could probably do that in a couple of days (but probably more, to be honest).

The C version was very straightforward. The C language is pretty much not there, and on the one hand, it didn’t get in my way at all. On the other hand, you are left pretty much on your own. I had to write my own error handling code to be sure that I got good errors, for example. I had to copy some string processing routines that aren’t available in the standard library, and I had to always be sure that I’m releasing resources properly. Adding dependencies is something that you do carefully, because it is so painful.

The Rust version, on the other hand, uses the default error handling that Rust has (and much improved since the last time I tried it). I’m pretty sure that I’m getting worse error messages than the C version I used, but that is good enough to get by, so that is fine. I had to do no resource handling. All of that is already handled for me, and that was something that I didn’t even consider until I started doing this comparison.

When writing the C version, I spent a lot of time thinking about the structure of the code, debugging through it (to understand what is going on, since I also learned how OpenSSL work) and seeing if things worked. Writing the code and compiling it were both things that I spent very little time on.

In comparison, the Rust version (although benefiting from the fact that I did it second, so I already knew what I needed to do) made me spend a lot more time on just writing code and getting it to compile.  In both cases, I decided that I wanted this to be a production worthy code, which meant handling all errors, producing good errors, etc. In C, that was simply a tax that needed to be dealt with. With Rust, that was a lot of extra work.

The syntax and language really make it obvious that you want to do that, but in most of the Rust code that I reviewed, there are a lot of unwrap() calls, because trying to handle all errors is too much of a burden. When you aren’t doing that, your code size balloons, but the complexity of the code didn’t, which was a great thing to see.

What was really annoying is that in C, if I got a compiler error, I knew exactly what the problem was, and errors were very localized. In Rust, a compiler error could stymie me for hours, just trying to figure out what I need to do to move forward. Note that the situation is much better than it used to be, because I eventually managed to get there, but it took a lot of time and effort, and I don’t think that I was trying to explore any dark corners of the language.

What really sucked is that Rust, by its nature, does a lot of type inferencing for you. This is great, but this type inferencing goes both backward and forward. So if you have a function and you create a variable using: HashMap::new(), the actual type of the variable depends on the parameters that you pass to the first usage of this instance. That sounds great, and for the first few times, it looked amazing. The problem is that when you have errors, they compound. A mistake in one location means that Rust has no information about other parts of your code, so it generates errors about that. It was pretty common to make a change, run cargo check and see three of four screen’s worth of errors pass by, and then go into a “let’s fix the next compiler error” for a while.

The type inferencing bit also come into play when you write the code, because you don’t have the types in front of you (and because Rust love composing types) it can be really hard to understand what a particular method will return.

C’s lack of async/await meant that when I wanted to do async operations, I had to decompose that to event loop mode. In Rust, I ended up using tokio, but I think that was a mistake. I should have used the event loop model there as well. It isn’t as nice, in terms of the code readability, but the fact that Rust doesn’t have proper async/await meant that I had a lot more additional complexity to deal with, and that nearly caused me to give up on the whole thing.

I do want to mention that for C, I had run Valgrind a few times to get memory leaks and invalid memory accesses (it found a few, even when I was extra careful). In Rust, the compiler was very strict and several times complained about stuff that if allowed, would have caused problems. I did liked that, but most of the time, it felt like fighting the compiler.

Speaking of which, the compilation times for Rust felt really high. Even with 400 lines of code, it can take a couple of seconds to compile (with cargo check, mind, not full build). I do wonder what it will do with a project of significant size.

I gotta say, though, compiling the C code meant that I would have to test the code. Compiling the Rust code meant that I could run things and they usually worked. That was nice, but at the same time, getting the thing to compile at all was a major chore many times. Even with the C code not working properly, the feedback loop was much shorter with C than with Rust. And some part of that was that I already had a working implementation for most of what I needed, so I had a lot less need to explore when I wrote the Rust code.

I don’t have any hard conclusions from the experience, I like the simplicity of C, and if I had something like Go’s defer to ensure resource disposal, that would probably be enough (I’m aware of libdefer and friends). I find the Rust code elegant (except the async stuff) and the standard library is great. The fact that the crates system is there means that I have very rich access to additional libraries and that this is easy to do. However, Rust is full of ceremony that sometimes seems really annoying. You have to use cargo.toml and extern crate for example.

There is a lot more to be done to make the compiler happy. And while it does catch you sometimes doing something your shouldn’t, I found that it usually felt like busy work more than anything else. In some ways, it feels like Rust is trying to do too much. I would have like to see something less ambitious. Just focusing on one or two concepts, instead of trying to be high and low level language, type inference set to the make, borrow checker and memory safety, etc. It feels like this is a very high bar to cross, and I haven’t seen that the benefits are clearly on the plus side here.

time to read 2 min | 345 words

I’m pretty much done with my Rust protocol impl. The last thing that I wanted to try was to see how it would look like when I allow for messages to be handled out of band.

Right now, my code consuming the protocol library looks like this:

This is pretty simple, but note that the function definition forces us to return a value immediately, and that we don’t have a way to handle a command asynchronously.

What I wanted to do is to change things around so I could do that. I decided to implemented the command:

remind 15 Nap

Which should help me remember to nap. In order to handle this scenario, I need to provide a way to do async work and to keep sending messages to the client. Here was the first change I made:

image

Instead of returning a value from the function, we are going to give it the sender (which will render the value to the client) and can return an error if the command is invalid in some form.

That said, it means that the echo implementation is a bit more complex.

There is… a lot of ceremony here, even for something so small. Let’s see what happens when we do something bigger, shall we? Here is the implementation of the reminder handler:

Admittedly, a lot of that is error handling, but there is a lot of code here to do something that simple.  Compare that to something like C#, where the same thing could be written as:

I’m not sure that the amount of complexity that is brought about by the tokio model, even with the async/await macros is worth it at this point. I believe that it needs at least a few more iterations before it is going to be usable for the general public.

There is way too much ceremony and work to be done, and a single miss and you are faced with a very pissed off compiler.

time to read 3 min | 410 words

After a lot of trouble, I’m really happy that I was able to build an async I/O implementation of my protocol. However, for real code, I think that I would probably recommend using with the sync API instead, since at least that is straightforward and doesn’t incur so much overhead at development time. The async stuff is still very much a “use at your own risk” kind of deal from my perspective. And I can’t imagine trying to use it in a large project and no suffering from the complexity.

As a good example, take a look at the following bit of code:

image

It doesn’t seem to be doing much, right? And it is clear what the intent of the code is.

However, if you try to compile this code you’ll get:

image

Now, it took me a long while to figure out what is going on.  The issue is that the code I’m seeing isn’t the actual code, because of macro expansions.

So let’s resolve this and see what the expanded code looks like:

This is after formatting, of course, but it certainly looks scary. Glancing at this code doesn’t tell me what the problem was, so I tried replacing the method with the expanded result, and I got the same error, but this time I got it on a line that helped me figure it out. Here is the issue:

image

We use the ? to return early from the poll method, and the Receiver I’m using in this case is defined to have a Result<String, ()>, so this is the cause of the problem.

I returned my own error type as a result, giving me the ability to convert from (), but that was a really hard thing to resolve.

It might be better to have Rust also offer to show the error on the expanded code by default, because it was somewhat of a chore to actually get to this.

What made this oh so confusing is that I had the exact same code, but using a Stream<String, io:Error> that worked, obviously. But it was decidedly non obvious to see what was the difference between two identical pieces of code.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  2. re (22):
    19 Aug 2019 - The Order of the JSON, AKA–irresponsible assumptions and blind spots
  3. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  4. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
  5. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats