Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,856 | Comments: 49,160

filter by tags archive
time to read 7 min | 1333 words

The first part is here. As a reminder, Sled is an embedded database engine written in Rust. It takes a very different approach for how to store data, which I’m really excited to see. And with that, let’s be about it. In stopped in my last post when getting to the flusher, which simply sleep and call flush on the iobufs.

The next file is iobuf.rs.

Note that there are actually two separate things here, we have the IoBuf struct:

image

And we have IoBufs struct, which contains the single buffer.

image

Personally, I would use a different name, because is is going to be very easy to confuse them. At any rate, this looks like this is an important piece, so let’s see what is going on in there.

You can see that the last three fields shown compose of what looks like a ring buffer implementation. Next, we have this guys:

image

Both of them looks interesting / scary. I can’t tell yet what they are doing, but the “interesting thread interleaving” stuff is indication that this is likely to be complex.

The fun starts at the start() function:

image

This looks like it is used in the recovery process, so whenever you are restarting the instance.  The code inside is dense, take a look at this:

image

This test to see if the snapshot provided is big enough to be considered stand alone (I think). You can see that we set the next_lsn (logical sequence number) and next_lid (last id? logical id?, not sure here). So far, so good. But all of that is a single expression, and it goes for 25 lines.

The end of the else clause also return a value, which is computed from the result of reading the file. It works, it is elegant but it takes a bit of time to decompose.

There is something called SegmentAccountant that I’m ignoring for now because we finally got to the good parts, where actual file I/O is being performed. Focusing on the new system startup, which is often much simpler, we have:

image

You can see that we make some writes (not sure what is being written yet) and then sync it. the maybe_fail! stuff is manual injection of faults, which is really nice to see. The else portion is also interesting.

image

This is when we can still write to the existing snapshot, I guess. And this limits the current buffer to the limits of the buffer size. From context, lid looks like the global value across files, but I’m not sure yet.

I run into this function next:

image

On its own, it isn’t really that interesting. I guess that I don’t like the unwrap call, but I get why it is there. It would significantly complicate the code if you had to handle failures from both the passed function and with_sa unable to take the lock (and that should only be the case if a thread panicked while holding the mutex, which presumably kill the whole process).

Sled calls itself alpha software, but given the number of scaffolding that I have seen so far (event log, debug_delay and not the measurements for lock durations) it has a lot of work done around actually making sure that you have the required facilities to support it. Looking at the code history, it is a two years old project, so there has been time for it to grow.

Next we have this method:

image

What this method actually going on in here is that based on the size of the in_buf, it will write it to a separate blob (using the lsn as the file name). If the value is stored externally, then it creates a message header that points to that location. Otherwise, it creates a header that encodes the length of the in_buf and returns adds that to the out_buf. I don’t like the fact that this method get passed the two bool variables. The threshold decision should probably be done inside the method and not outside. The term encapsulate is also not the best choice here. write_msg_to_output would probably more accurate.

image

The next interesting bit is write_to_log(), which goes on for about 200 lines. This takes the specified buffer, pad it if needed and write it to the file. It looks like it all goes to the same file, so I wonder how it deals with space recovery, but I’ll find it later. I’m also wondering why this is called a log, because it doesn’t look like a WAL system to me at this point. It does call fsync on the file after the write, though. There is also some segment behavior here, but I’m used to seeing that on different files, not all in the same one.

It seems that this method can be called concurrently on different buffers. There is some explicit handling for that, but I have to wonder about the overall efficiency of doing that. Random I/O and especially the issuance of parallel fsyncs, are likely not going to lead to the best system perf.

image

Is where the concurrent and possibly out of order write notifications are going. This remembers all the previous notifications and if there is a gap, will wait until the gap has been filled before it will notify interested parties that the data has been persisted to disk.

Because I’m reading the code in a lexical order, rather than in a more reasonable layered or functional approach, this gives me a pretty skewed understanding of the code until I actually finish it all. I actually do that on purpose, because it forces me to consider a lot more possibilities along the way. There are a lot of questions that might be answered by looking at how the code is actually being used in parts that I haven’t read yet.

At any rate, one thing to note here is that I think that there is a distinct possibility that a crash after segment N +1 was written (and synced) but before segment N was done writing might undo the successful write for N+1. I’m not sure yet how this is used, so that is just something to keep in mind for now.

I managed to find this guy:

image

If you’ll recall, this is called on a timer to write the current in memory state.

The only other interesting thing in this file is:

image

This looks like it is racing with other calls on trying to write the current buffer. I’m not so sure what sealed means. I think that this is to prevent other threads from writing to the same buffer while you might be trying to write it out.

That is enough for now, it has gotten late again, so I’ll continue this in another post.

time to read 11 min | 2171 words

imageThe Sled project is an embedded database written in Rust. I run into it a few times recently and given my day job, I decided to take a peek and understand how it works. The project talks about being Log Structure Merge (and also exposing this to the client) with B+Tree read performance. The last time I read an LSM codebase was quite some time ago, so this is going to be quite interesting, I hope.

As usual, I’m doing a stream of consciousness as I’m going through the code. I’m looking at commit 6ce3e1264d3bb46de768e8fe338bade02d0b30dc.

The first thing that I found is the actual source code for this:

image

The fact that there is a page cache component already tell me quite a lot about the project. That is uses pages (LevelDB does not, for example) and that it is cached (so likely not using mmap). That will likely be important later, but when starting out I enjoy jumping to conclusions and then seeing if I was right or not.

The following is taken from the readme for the pagecache:

image

This is quite interesting. Mostly because this shows a very different approach to building a page cache. It isn’t your usual B-Tree page, it seems. The docs reference LLAMA a lot, but I would rather go through the code first and read scholarly articles later.

As usual, I’m going over the code in lexical order, which means that the first real code file that I run into is a Rust implementation of doubly linked list. Nothing really interesting there, except massive amounts of unsafe code, so I’m going to skip to it’s usage, which is to implement a least recently used cache.

image

There are N shards, determined by the configuration of the system. This is controlled by cache_bits, where 2^CacheBits is the actual size. I’m not sure why the code uses this methods instead of just a number.

The codebase contains the values 0, 2 and 6 for cache_bits, giving us 1 shard, 4 shards and 64 shards, respectively. I assume that they use this to reduce contention on the same Shard instance for concurrent operations.

The LRU’s main method is accessed, which simply forward the call to one of the shards, so I’m going to skip that and see how that actually works when I’ll look at the shard code. Here is what the Shard looks like:

image image

Note that capacity is actually the max capacity, not reserved space in the entries vector. Each entry contains the actual size that it is using, as well and the Node itself (that reside in the LRU). The actual value that is kept is a PageId, which is the native integer type for the platform (32 bits – 4 bytes / 64 bits – 8 bytes). I’m not sure yet if this is a persisted value, but for something that is meant to be used in a persisted store, I find that choice very strange. It means that on 32 bits machine, you can only use 4 billion pages, while 64 bits there is much higher limit. That means that a database created on 64 bits might not be valid for 32 bits.

When building Voron, we were very careful that data will be the same in 32 bits and 64 bits, Windows / MacOS / Linux, etc.

When you call accessed on the Shard, it will add it to the list, and if the size of all entries is higher than the capacity, will remove entries until the size in the cache is smaller than the maximum capacity.

Interestingly enough, Sled is using an Epoch based GC system in Rust, implemented by Crossbeam. I wrote about Epoch systems when I reviewed FASTER, and I understand that Epoch GC is pretty common in Rust concurrent programming, because otherwise you run the risk of memory corruption.

The ds (I assume data structure) folder of the page cache also provide a concurrent Stack implementation, which it pretty elegant to read, but you need to be well versed in the Epoch handling to understand how it all works correctly under the covers.

The interesting stuff in the ds folder is the pagetable, which start with the following structs:

image

Each of those is initialized to array of pointers that is 2 MB in size (262,144 pointers, basically). That is a pretty big structure, and the PageTable struct ties it all together:

image

The top of the file mention that this assume that the caller is going to have a dense key space, and this make sense. Theoretically, this will use a maximum of 512GB if you store a sparse set of data in the page table.

All the operations in the page table are some variant of the traverse operation, so let’s focus on that.

image

First, the key is split to the first 18 bits (0 .. 256K) and the rest of the value. The first part is used to find the level 1 value.

The debug_delay() calls are debug only calls that add random waits, which help catch race conditions. Given that the code uses explicit Atomic instructions, I’m not surprised to see that there is really no help from the Rust safety behavior to ensure that there aren’t any logical race conditions.

This is the first time I’m reading real Rust code that explicitly deal with concurrency (without using the hammer that is Mutex, at least), and it is… reassuring seeing the same kind of behaviors that I’m injecting into my code to validate things.

This next bit handles adding a new value (safely) at the next level. Given the dense assumption, we can safely assume that this is a rare operation

image

Look at the comment when we fail to win the compare_and_set call. Who is actually going to drop the unused created value? That is on the Guard and Epoch, I think. We associated the value with them when we called into_shared, and since we didn’t take it out from the guard, it will be removed when the Epoch is over.

The final act of traverse() is to just return the value from the second level, like so:

image

Note that this may be an empty value, so let’s see how this actually get used. I’m using the simplest caller, which is get().

image

You can see that we have to do explicit null checking, translating that to an Option call to make calling code idiomatic.  The rest of the code there is just memory management, so I’m going to skip that and look at the rest of the pagecache code, now that I understand the underlying data structures.

Note, the code uses the Lsn as a logical type name, this maps to a 64 bits signed integer.

The first file in the pagecache crate is the blob_io one, which handles reading blob. Here is the (crate public only) functions that are implemented there.

image

Remember, LSN is the (Logical Sequence Number) and is a signed 64 bits integer. Config I’m not sure about at this point. The read_blob and write_blob are fairly simple. They just read / write (with CRC32 checks) data from disk. This is done on a full file each time. So that is quite interesting. The Config is responsible for mapping between an LSN and a file path.

The call to remove_blob will simply delete the mapped file from the disk, but it is the call to gc_blobs that is really interesting. What this does is to basically delete all the files whose name is greater than the stable_lsn provided. This is interesting, because it tells us a lot about the actual on disk persistent format.

Based on just the blob_io.rs file, I can tell that Sled is writing fairly large files to disk and reading them to memory. It is probably going to call gc_blobs() as part of its recovery and I’m willing to assume at this point that each blob is actually a segment. As I said, I like assuming things upfront when doing this kind of code reviews. I’m going to continue my reading and see if I was right.

The config file starts with usage instructions, which looks like:

image

So that is really interesting. First, we can see that Sled is probably only calling fsync once in a while, depending on its configuration. I looked into a real world project, and it uses the default value, which is flush every 500 ms and snapshot after 1 million. The reason this is interesting is that my experience tells me is that whenever I see a configuration option like that, it means that fsync is going to be pretty expensive and the database engine has chosen to allow users to trade performance for safety.

Voron, in contrast, actually don’t have any such feature. We don’t allow users to make that choice, instead, we accepted the fsync costs and dealt with them (using async commit and transaction merging, for example).

Anyway, we were talking about the config and how it works, here is how the Config gives us the path for a blob:

image

So this is pretty much just as expected. The data for each blob is a separate file under the blobs directory. Now we are only left with figuring out what that blob was.

The next file of interest is the DiskPtr file, which contains this struct:

image

And the only interesting method is the read():

image

So now we have more interesting data. There is stuff that is kept inline (presumably small items) and larger stuff that are read from the blobs directory. We already saw what the read_blob behavior is, but we haven’t checked what the meaning of read_message is. This is implemented later in the code, so I’m going to ignore it and just look at its return value:

image

So that allow us to make sense of what is going above. Basically, this is “gimme some bytes” call, regardless of where this is actually stored (which I don’t know yet).

The next file is event_log, and it is a debug tool to help verify the behavior of the system. I wholeheartedly approve and it is a really good sign for the maturity of the codebase.

The next is flusher, which is a dedicate thread that call this continuously:

image

There is nothing of interest in there, but it does indicate no coordination between flushing and other activities. Not sure whatever this is a good or bad idea.

Now, let’s see what this iobuf is all about, shall we?

The first thing I run into in this file is this:

image

And I can’t really figure this one out. I mean, why? Just setting InnerLsn = i64 should be good enough for all platforms, I think. I guess there might be some optimizations stuff that I’m missing, but that seems unlikely. In 64 bits, I would expect isize and i64 to have the exact same behavior.

This is now getting pretty late, so I’m going to close this for today. Tomorrow, I’m going to try to figure out how the rest of it works.

time to read 6 min | 1018 words

I got a great comment on my previous post about using Map/Reduce indexes in RavenDB for event sourcing. The question was how to handle time sensitive events or ordered events in this manner. The simple answer is that you can’t, RavenDB intentionally don’t expose anything about the ordering of the documents to the index. In fact, given the distributed nature of RavenDB, even the notion of ordering documents by time become really hard.

But before we close the question as “cannot do that by design", let’s see why we want to do something like that. Sometimes, this really is just the developer wanting to do things in the way they are used to and there is no need for actually enforcing the ordering of documents. But in other cases, you want to do this because there is a business meaning behind these events. In those cases, however, you need to handle several things that are a lot more complex than they appear. Because you may be informed of an event long after that actually happened, and you need to handle that.

Our example for this post is going to be mortgage payments. This is a good example of a system where time matters. If you don’t pay your payments on time, that matters. So let’s see how we can model this as an event based system, shall we?

A mortgage goes through several stages, but the only two that are of interest for us right now are:

  • Approval – when the terms of the loan are set (how much money, what is the collateral, the APR, etc).
  • Withdrawal – when money is actually withdrawn, which may happen in installments.

Depending on the terms of the mortgage, we need to compute how much money should be paid on a monthly basis. This depend on a lot of factors, for example, if the principle is tied to some base line, changes to the base line will change the amount of the principle. If only some of the amount was withdrawn, if there are late fees, balloon payment, etc. Because of that, on a monthly basis, we are going to run a computation for the expected amount due for the next month.

And, obviously, we have the actual payments that are being made.

Here is what the (highly simplified) structure looks like:

image

This includes all the details about the mortgage, how much was approved, the APR, etc.

The following is what the expected amount to be paid looks like:

image

And here we have the actual payment:

image

All pretty much bare bones, but sufficient to explain what is going on here.

With that in place, let’s see how we can actually make use of it, shall we?

Here are the expected payments:

image

Here are the mortgage payments:

image

The first thing we want to do is to aggregate the relevant operations on a monthly basis, since this is how mortgages usually work. I’m going to use a map reduce index to do so, and as usual in this series of post, we’ll use JavaScript indexes to do the deed.

Unlike previous examples, now we have real business logic in the index. Most specifically, funds allocations for partial payments. If the amount of money paid is less than the expected amount, we first apply it to the interest, and only then to the principle.

Here are the results of this index:

image

You can clearly see that mistake that were made in the payments. On March, the amount due for the loan increased (took another installment from the mortgage) but the payments were made on the old amount.

We aren’t done yet, though. So far we have the status of the mortgage on a monthly basis, but we want to have a global view of the mortgage. In order to do that, we need to take a few steps. First, we need to define an Output Collection for the index, that will allow us to further process the results on this index.

In order to compute the current status of the mortgage, we aggregate both the mortgage status over time and the amount paid by the bank for the mortgage, so we have the following index:

Which gives us the following output:

image

As you can see, we have a PastDue marker on the loan. At this point, we can make another payment on the mortgage, to close the missing amount, like so:

image

This will update the monthly mortgage status and then the overall status. Of course, in a real system (I mentioned this is highly simplified, right?) we’ll need to take into account payments made in one time but applied to different times (which we can handle by an AppliedTo property) and a lot of the actual core logic isn’t in indexes. Please don’t do mortgage logic in RavenDB indexes, that stuff deserve its own handling, in your own code. And most certainly don’t do that in JavaScript. The idea behind this post is to explore how we can handle non trivial event projection using RavenDB. The example was chosen because I assume most people will be familiar with it and it wasn’t immediately obvious how to go about actually solving it.

If you want to play with this, you can import the following file (Settings > Import Data) to get the documents and index definitions.

time to read 7 min | 1321 words

imageAlmost by accident, it turned out that I implemented a pretty simple, but non trivial task in both C and Rust and blogged about them.

Now that I’m done with both of them, I thought it would be interesting to talk about the differences in the experiences.

The Rust version clocks at exactly 400 lines of code and uses 12 external crates.

The C version has 911 lines of C code and another 140 lines in headers and depends on libuv and openssl.

Both took about two weeks of evenings of me playing around. If I was working full time on that, I could probably do that in a couple of days (but probably more, to be honest).

The C version was very straightforward. The C language is pretty much not there, and on the one hand, it didn’t get in my way at all. On the other hand, you are left pretty much on your own. I had to write my own error handling code to be sure that I got good errors, for example. I had to copy some string processing routines that aren’t available in the standard library, and I had to always be sure that I’m releasing resources properly. Adding dependencies is something that you do carefully, because it is so painful.

The Rust version, on the other hand, uses the default error handling that Rust has (and much improved since the last time I tried it). I’m pretty sure that I’m getting worse error messages than the C version I used, but that is good enough to get by, so that is fine. I had to do no resource handling. All of that is already handled for me, and that was something that I didn’t even consider until I started doing this comparison.

When writing the C version, I spent a lot of time thinking about the structure of the code, debugging through it (to understand what is going on, since I also learned how OpenSSL work) and seeing if things worked. Writing the code and compiling it were both things that I spent very little time on.

In comparison, the Rust version (although benefiting from the fact that I did it second, so I already knew what I needed to do) made me spend a lot more time on just writing code and getting it to compile.  In both cases, I decided that I wanted this to be a production worthy code, which meant handling all errors, producing good errors, etc. In C, that was simply a tax that needed to be dealt with. With Rust, that was a lot of extra work.

The syntax and language really make it obvious that you want to do that, but in most of the Rust code that I reviewed, there are a lot of unwrap() calls, because trying to handle all errors is too much of a burden. When you aren’t doing that, your code size balloons, but the complexity of the code didn’t, which was a great thing to see.

What was really annoying is that in C, if I got a compiler error, I knew exactly what the problem was, and errors were very localized. In Rust, a compiler error could stymie me for hours, just trying to figure out what I need to do to move forward. Note that the situation is much better than it used to be, because I eventually managed to get there, but it took a lot of time and effort, and I don’t think that I was trying to explore any dark corners of the language.

What really sucked is that Rust, by its nature, does a lot of type inferencing for you. This is great, but this type inferencing goes both backward and forward. So if you have a function and you create a variable using: HashMap::new(), the actual type of the variable depends on the parameters that you pass to the first usage of this instance. That sounds great, and for the first few times, it looked amazing. The problem is that when you have errors, they compound. A mistake in one location means that Rust has no information about other parts of your code, so it generates errors about that. It was pretty common to make a change, run cargo check and see three of four screen’s worth of errors pass by, and then go into a “let’s fix the next compiler error” for a while.

The type inferencing bit also come into play when you write the code, because you don’t have the types in front of you (and because Rust love composing types) it can be really hard to understand what a particular method will return.

C’s lack of async/await meant that when I wanted to do async operations, I had to decompose that to event loop mode. In Rust, I ended up using tokio, but I think that was a mistake. I should have used the event loop model there as well. It isn’t as nice, in terms of the code readability, but the fact that Rust doesn’t have proper async/await meant that I had a lot more additional complexity to deal with, and that nearly caused me to give up on the whole thing.

I do want to mention that for C, I had run Valgrind a few times to get memory leaks and invalid memory accesses (it found a few, even when I was extra careful). In Rust, the compiler was very strict and several times complained about stuff that if allowed, would have caused problems. I did liked that, but most of the time, it felt like fighting the compiler.

Speaking of which, the compilation times for Rust felt really high. Even with 400 lines of code, it can take a couple of seconds to compile (with cargo check, mind, not full build). I do wonder what it will do with a project of significant size.

I gotta say, though, compiling the C code meant that I would have to test the code. Compiling the Rust code meant that I could run things and they usually worked. That was nice, but at the same time, getting the thing to compile at all was a major chore many times. Even with the C code not working properly, the feedback loop was much shorter with C than with Rust. And some part of that was that I already had a working implementation for most of what I needed, so I had a lot less need to explore when I wrote the Rust code.

I don’t have any hard conclusions from the experience, I like the simplicity of C, and if I had something like Go’s defer to ensure resource disposal, that would probably be enough (I’m aware of libdefer and friends). I find the Rust code elegant (except the async stuff) and the standard library is great. The fact that the crates system is there means that I have very rich access to additional libraries and that this is easy to do. However, Rust is full of ceremony that sometimes seems really annoying. You have to use cargo.toml and extern crate for example.

There is a lot more to be done to make the compiler happy. And while it does catch you sometimes doing something your shouldn’t, I found that it usually felt like busy work more than anything else. In some ways, it feels like Rust is trying to do too much. I would have like to see something less ambitious. Just focusing on one or two concepts, instead of trying to be high and low level language, type inference set to the make, borrow checker and memory safety, etc. It feels like this is a very high bar to cross, and I haven’t seen that the benefits are clearly on the plus side here.

time to read 2 min | 345 words

I’m pretty much done with my Rust protocol impl. The last thing that I wanted to try was to see how it would look like when I allow for messages to be handled out of band.

Right now, my code consuming the protocol library looks like this:

This is pretty simple, but note that the function definition forces us to return a value immediately, and that we don’t have a way to handle a command asynchronously.

What I wanted to do is to change things around so I could do that. I decided to implemented the command:

remind 15 Nap

Which should help me remember to nap. In order to handle this scenario, I need to provide a way to do async work and to keep sending messages to the client. Here was the first change I made:

image

Instead of returning a value from the function, we are going to give it the sender (which will render the value to the client) and can return an error if the command is invalid in some form.

That said, it means that the echo implementation is a bit more complex.

There is… a lot of ceremony here, even for something so small. Let’s see what happens when we do something bigger, shall we? Here is the implementation of the reminder handler:

Admittedly, a lot of that is error handling, but there is a lot of code here to do something that simple.  Compare that to something like C#, where the same thing could be written as:

I’m not sure that the amount of complexity that is brought about by the tokio model, even with the async/await macros is worth it at this point. I believe that it needs at least a few more iterations before it is going to be usable for the general public.

There is way too much ceremony and work to be done, and a single miss and you are faced with a very pissed off compiler.

time to read 3 min | 410 words

After a lot of trouble, I’m really happy that I was able to build an async I/O implementation of my protocol. However, for real code, I think that I would probably recommend using with the sync API instead, since at least that is straightforward and doesn’t incur so much overhead at development time. The async stuff is still very much a “use at your own risk” kind of deal from my perspective. And I can’t imagine trying to use it in a large project and no suffering from the complexity.

As a good example, take a look at the following bit of code:

image

It doesn’t seem to be doing much, right? And it is clear what the intent of the code is.

However, if you try to compile this code you’ll get:

image

Now, it took me a long while to figure out what is going on.  The issue is that the code I’m seeing isn’t the actual code, because of macro expansions.

So let’s resolve this and see what the expanded code looks like:

This is after formatting, of course, but it certainly looks scary. Glancing at this code doesn’t tell me what the problem was, so I tried replacing the method with the expanded result, and I got the same error, but this time I got it on a line that helped me figure it out. Here is the issue:

image

We use the ? to return early from the poll method, and the Receiver I’m using in this case is defined to have a Result<String, ()>, so this is the cause of the problem.

I returned my own error type as a result, giving me the ability to convert from (), but that was a really hard thing to resolve.

It might be better to have Rust also offer to show the error on the expanded code by default, because it was somewhat of a chore to actually get to this.

What made this oh so confusing is that I had the exact same code, but using a Stream<String, io:Error> that worked, obviously. But it was decidedly non obvious to see what was the difference between two identical pieces of code.

time to read 3 min | 588 words

On my last post, I got really frustrated with tokio’s complexity and wanted to move to use mio directly. The advantages are that the programming model is pretty simple, even if actually working with is is hard. Event loops can cause your logic to spread over many different locations and make it hard to follow. I started to go that path until I figure out just how much work it would take. I decided to give tokio a second change, and at this point, I looked into attempts to provide async/await functionality to Rust.

It seems that at least some work is already available for this, using futures + some Rust macros. That let me write code that is much more natural looking, and I actually managed to make it work.

Before I get to the code, I want to point out some concerns that I have right now. The futures-await crate (and indeed, all of tokio) seems to be in a state of flux. There is an await in tokio, and I think that there is some merging around of all of those libraries into a single whole. What I don’t know, and can’t find any information about, is what I should actually be using, and how all the pieces come together. I have to note that even with async/await, the programming model is still somewhat awkward, but it is at a level that I can live with. Here is how I built it.

First, we need to accept connections, which is done like so:

Note that I have two #[async[ annotations. One for the method as a whole and one for the for loop. This just accept the connection and spawn a task to handle that, the most interesting tidbits are in the actual processing of the connection:

You can see that this is fairly straightforward code. We first do the TLS handshake, then we validate the certificate. If there is an auth error, we send it to the user and back off. If we are successful, however, things get interesting.

I create a channel, which allow me to  split off the read and write portions of the task. This means that I can send results out of order, if I wanted to, which is great for the actual protocol handling. The first thing to do is to send the OK string to the client, so they know that we successfully connected, then we spawn the read/write tasks. The write task is pretty simple, overall:

You can see the funny .0 references, which is an artifact of the fact that the write_all() function consumes the writer we pass to it and return (a potentially different) writer in the result.  This is pretty common for functional languages.

I’m pretty sure that I can avoid the two calls to write_all for the postfix, but that is easier for now.

Processing the commands is simple as well:

For each command we support, we have an entry on the server configuration and we fetch and invoke it. The result of the command will be written to the client by the write task. Right now we have a 1:1 association between them, but this is now easily broken.

And finally, having an actually command run and running the server itself:

This is pretty simple now, and it give us a nice model to program commands and responses.

I pushed the whole code to this branch, if you care to look at it.

I have some more comments about this code, but I’ll reserve them for another post.

time to read 2 min | 346 words

I kept going with tokio for a while, I even got something that I think would eventually work. The whole concept is around streams, so I create a way to generate them. This is basically taking this code and making it async.

I gave up well into the second hour. Here is where I stopped:

image

I gave up when I realized that the reader I’m using (which is SslStream) didn’t seem to have poll_read. The way I’m reading the code, it is supposed to, but I just threw up my hands at disgust at this time. If it this hard, it ain’t going to happen.

I wrote significant amount of async code in C# at the time when events and callbacks were the only option and then when the TPL and ContinueWith was the way to go. That was hard, and async/await is a welcome relief, but the level of frustration and “is this wrong, or am I really this stupid?” that I got midway through is far too much.

Note that this isn’t even about Rust. Some number of issues that I run into were because of Rust, but the major issue that I have here is that I’m trying to write a function that can be expressed in a sync manner in less than 15 lines of code and took me about 10 minutes to write the first time. And after spending more hours than I’m comfortable admitting, I couldn’t get it to work. The programming model you have here, even if everything did work, means that you have to either decompose your behavior to streams and interact with them in this manner or you put everything as nested lambdas.

Either option doesn’t make for a nice model to work with. I believe that there is another async I/O model for Rust, the MIO crate, which is based on the event loop model. I’ve already implemented that in C, so that might be a more natural way to do things.

time to read 5 min | 941 words

Now that we have a secured and authentication connection, the next stage in making a proper library is to make it run more than a single connection at time. I could have use a thread per connection, of course, or even use a thread pool, but neither of those options is valid for the kind of work that I want to see, so I’m going to jump directly into async I/O in Rust and see how that goes.

The sad thing about this is that I expect that this will make me lose some / all of the nice API that I get for OpenSSL in the sync mode.

Async in Rust is handled by a crate called tokio, and there seems to be active work to bring async/await to the language itself. In the meantime, we have to make do with the usual facilities, which ought to make this interesting.

It actually looks like there is a crate that gives pretty nice handling of tokio async I/O and OpenSSL so that is encouraging. However, as part of trying to re-write everything in tokio style, I got the compiler very upset with me. Here is (partial) error message:

image

Last time I had to parse such errors, I was working in C++ templated code and the year was 1999.

And here is the piece of code it so dislikes:

image

I googled around and there is this detailed answer on a similar topic that frankly, frightened me. I shouldn’t have to dig this deeply and have to start drawing diagrams on so many disparate pieces of the code just to figure out a compiler error.

Let’s try to break it to its component parts and see if that make sense, I reduce the code in question to just:

image

Got another big scary error message. Okay, let’s try it without the OpenSSL stuff?

image

This produce the same error, but in a much less scary tone:

image

Okay, now this looks about as simple as it can be. And now the fix is pretty obvious:

image

The key to understand here, I believe (I haven’t tested it yet) that the write_all call will either perform its work or schedule it, so any future work based on it should go in a nested and_then call. So the result of the single for_each invocation is not the direct continuation of the previous call.

That is fine, I’ll deal with that, I guess.

Cue here about six hours of programming montage.

I have been programming over 20 years, I like to think that I have been around the block a few times. And the simple task of reading a message from TCP using async I/O took me far too long. Here is what I eventually ended up with:

image

This is after fighting with the borrow checker (a lot, it ended up winning), trying to grok my head around the model that tokio has. It is like they took the worst parts of async programming, married it to stream programming’s ugly second cousin and then decided to see if any of the wedding guests is open for adoption.

And if the last sentence doesn’t make sense to you, you are welcome, that is how I felt at certain points. Here is one of the errors that I run into:

image

What is this string, where did it come from and why do we have a unit “()” there? Let me see if I can explain what is going on here. Here is a very simple bit of code that would explain things.

image

And here is the error it generates:

image

The problem is that spawn is expecting a future that results a result that has no meaning, something like: Future<Result<(), ()>>. This make sense, since there isn’t really anything that it can do with whatever the result is. But the error can be really confusing. I spent a lot of time trying to actually parse this, then I had to go and check the signatures of the method involved, and then I had to reconstruct what are the generic parameters that are required, etc.

The fix, btw, is this:

image

Ask yourself how long it would take you to figure what the changes between these versions of the code are without the marker.

Anyway, although I’m happy that I got something done, this approach is really not sustainable. I’m pretty sure that I’m either doing something wrong or missing something. It shouldn’t be this hard. I got some ideas that I want to try, which I’ll talk about in the next post.

time to read 3 min | 478 words

After running into a few hurdles, I managed to get rust openssl bindings to work, which means that this is now the time to actually wire things properly in my network protocol, let’s see how that works, shall we?

First, we have the OpenSSL setup:

As you can see, this is pretty easy and there isn’t really anything there that is of actual interest. It does feel a whole lot easier than dealing with OpenSSL directly in C, though.

That said, when I started actually dealing with the client certificate, things got a lot more complicated. The first thing that I wanted to do is to do my authentication, which is defined as:

  • Client present a client certificate (can be any client certificate).
  • If a client doesn’t give a certificate, we accept the connection, send a message (using the encrypted tunnel) and abort.
  • If the client provide an certificate, it must be one that was previously registered in the server. That is what allowed_certs_thumbprints is for. If it isn’t, we accept the connection, write an error and abort.
  • If the client certificate has expired or is not yet valid, accept, write error & abort.

You get the gist. Here is what I had to do to implement the first part:

Most of the code, actually, is about generating proper and clear error messages, more than anything else. I’m not sure how to get the friendly name from the certificate, but this seems to be a good enough stand-in for now.

We validate that we have a certificate, or send an error back. We validate that the certificate presented is known to us, or we send an error back.

The next part I wanted to implement was… really far too hard than it should be. I just wanted to verify that the certificate not before/not after dates are valid. And the problem is that the rust bindings for OpenSSL do not expose that information. Luckily, because it is using OpenSSL, I can just call to OpenSSL directly. That led me to some interesting search into how Rust calls out to C, how foreign types work and a lot of “fun” like that. Given that I’m doing this to learn, I suppose that this is a good thing, though.

Here is what I ended up with (take a deep breath):

Notice that I’m doing all of this (defining external function, defining helper functions) inside the authenticate_certificate function. Coming up with that was harder than expected, but I really liked the fact that it was possible, and that I can just shove this into a corner of my code and not have to make a Big Thing out of it.

And with that, I the authentication portion of my network protocol in Rust done.

The next stage is going to be implementing a server that can handle more than a single connection at a time Smile.

FUTURE POSTS

  1. NServiceBus 7.x & RavenDB 4.2 - 15 hours from now

There are posts all the way to Jun 19, 2019

RECENT SERIES

  1. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
  2. Reviewing Sled (3):
    23 Apr 2019 - Part III
  3. RavenDB 4.2 Features (5):
    21 Mar 2019 - Diffing revisions
  4. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  5. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats