Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,812 | Comments: 49,041

filter by tags archive
time to read 2 min | 301 words

The Reddit’s front page contain a list of recent posts from all communities. In most cases, you want to show posts from communities that the user is subscribe to, but at the same time, you want to avoid flooding the front page with posts from any single community. You also need this to be really fast.

It turns out that doing this in RavenDB is actually very easy. We are going to create a map/reduce index that will aggregate the few most recent posts per community, like so:

image

What this index will do is provide us with the five most recent posts in each community, as well as their date. This is an interesting example of a map/reduce index, because we are using both aggregation and fanout in the index.

The nice thing about this index is that we can project the results directly from it to the user. Let’s see how the queries will look like:

image

This is a simple query that does quite a lot. It gives us the most recent 15 posts across all the communities that the user care about, with no single community able to generate more than 5 posts. It sort them the posted date and fetch the actual posts in the same query. This is going to give you consistent performance regardless of how much data you have and how many updates your experience. The actual Reddit front page is a lot more complex, I’m sure, but this serve as a nice example of how you can do non trivial stuff in RavenDB’s indexes that simplify your life by a lot.

time to read 4 min | 760 words

I talked about some of the requirements for proper workflow design in my previous post. As a reminder, the top ones are:

  • Cater for developers, not the business analysts. (More on this later).
  • Source control isn’t optional, meaning:
    • Multiple branches
    • Can diff & review changes
    • Merging
    • Multiple people can work at the same time
  • Encapsulate complexity

This may seem like a pretty poor list, because if you are a developer, you might be taking all of these as granted. Because of that, I wanted to display a small taste from what used to be Microsoft’s primary workflow engine.

image

A small hint… this kind of system is not going to be useful for anything relating to source control, change management, collaborative work, understanding what is going on, etc.

A better solution for this would be to use a tool that can work with source control, that developers are familiar with and can handle the required complexity.

That tool is called… code.

It checks all the boxes required, naturally. But it does have a distinct disadvantage. One of the primary reasons you want to use a workflow engine of some kind is to decouple the implementation of your business from the policies of the business. Coming back to the mortgage example, how you calculate late fees payment is fixed (in the contract itself, but usually also by law and many regulations), but figuring out whatever late fees should be waived, on the other hand, is subject to the whims of the business.

That is a pretty simple example, but in most businesses, these kind of workflows adds up. You can easily end up with dozens to hundreds of different workflows without the business being too big or complex.

There is another issue, though. Code is pretty good when you need to handle straightforward tasks. A set of if statements (which is pretty much all most workflows are) are trivial to handle. But workflow has another property, they tend to be long. Not long on computer scale (seconds), but long on people scale (months and years).

The typical process of getting a loan may involve an initial submission, review by a doctor, asking for follow up documentation (rinse – repeat a few times), getting doctor appraisal and only then being able to generate a quote for the customer. Then we have a period of time in which the customer can accept, a qualifying period, etc. That can last for a good long while.

Trying to code long running processes like that require us a very unnatural approach to coding. Especially since you are likely to need to handle software updates while the workflows are running.

In short, we are in a strange position: we want to use code, because it is clear, support software development practices that are essentials and can scale up in complexity as needed. On the other hand, we don’t want to use our usual codebase for that, because we’ll have very different deployment strategies, the manner of working is very different and there is a lot more involvement of the business in what is going on there.

The way to handle that is to create a proper boundary between parts of the system. We’ll have the workflow behavior, defined in scripts, that describe the policy of the system. These tend to be fairly high level concepts and are designed explicitly for the rule of business policy behaviors. The infrastructure for that, on the other hand, is just a standard application using normal software practices, that is driven by the workflow scripts.

And by a script, I meant literally a script. As in, JavaScript.

I want to give you a sneak peak into how I envision this kind of system, but I’ll defer full discussion of what is involved to my next post.



The idea is that we use the script to define our policy, and then we use that to make decisions and invoke the next stage in the process. You might notice that we have the state variable, which is persisted between invocations. That allow us to use a programming model that is fairly common and obvious to developers. We can usually also show this, as is, to a business analyst and get them to understand what is going on easily enough. All the actual actions are abstracted. For example, life insurance setup is a completely different workflow that we invoke.

In my next post, I’m going to drill down a bit into the details of this approach and what kind of features do we need there.

time to read 5 min | 815 words

One of the most common themes I run into when talking to customers, users and sundry people in tech is the repeated desire to fire developers.

Actually, that is probably too loaded a statement. It actually come in two parts:

  • Developers usually want to focus on the interesting bits, and the business logic portions aren’t that much fun.
  • The business analysts usually want to get things done and having to get developers to do that is considered inefficient.

If only there was a tool, or a pattern, or a framework, or something that would allow the business analysts to directly specify the behavior of the system… Why, we could cut the developers from the process entirely! And speaking as a developer, that would be a huge relief.

I think the original name for that was CASE tools, and that flopped. In fact, literally every single one of the attempts to replace developers by a tool has flopped. They got such a bad rap that people keep trying to implement them using different names. Some stuff can be done fairly easily, though. WYSIWYG for GUI is well established and Wordpress and WIX, to name the two examples that come to mind immediately, show that you can have a non techie build a proper website. In fact, you can even plug in some pretty sophisticated functionality without burdening the user with too much.

But all that takes you to a point. And past that point, the drop off is harsh. Let’s take another common tool that is used to reduce the dependency on developers, SharePoint.

You pay close to double for actual developer time on SharePoint, mostly because it is so painful to work with it.

In a recent conference, I got into a conversation about business workflows and how to best implement them. You can look at the image on the right to get a good idea about what kind of process they were talking about.

To make things real, I want to take a “simple” example, of accepting a life insurance policy. Here is what the (extremely simplified) workflow looks like for issuing a life insurance policy:

image

This looks good, and it certainly should make sense to a business analyst. However, even after I pretty much reduced the process to its bare bones and even those has been filed away, this is still pretty complex. The process of actually getting a policy is actually a lot more complex. Some questions don’t require doctor evaluation (for example, smoking) and some require supplemental documentation (oh, you were hospitalized? Gimme all these records). The doctor may recommend different rates, rejecting entirely, some exceptions in the policy, etc. All of which need to be in the workflow. Actuarial tables needs to be consulted for each of those cases, etc, etc, etc.

But something like the diagram above isn’t going to be able to handle this level of complexity. You are going to get lost very quickly if you try to put so many boxes on the screen.

So you need encapsulation.

And you’ll probably want to have a way to develop these business workflows, which means that they aren’t static.

So you need source control.

And if you have a complex business process, you likely have different people working on it.

So you need to be able to review changes, and merge them.

Note that this is explicitly distinct from being able to store the data in source control. Being able to actually diff in a meaningful fashion two versions of such a process is anything but trivial. Usually you are left with diffing the raw XML / JSON that store the structure. Good luck with that.

If the workflow is complex, you need to be able to understand what is going on under various conditions.

So you need a debugger.

In fact, pretty soon you’ll realize that you’ll need quite a lot of the things that developers do. Except that your tool of choice doesn’t do that, or if they do, they do it poorly.

Another issue that even if you somehow managed to bypass all of those details, you are going to be facing the same drop that you see elsewhere with tools that attempt to get rid of developers. At some point, the complexity grows too large, and you’ll call your development team and hand of the system to them. At which point they will be stack with a very clucky tool that attempt to be quite clever and easy to use. It is also horribly limiting for a developer. Mostly because all of the “complexity” involved is in the business process itself, not in the actual complexity of what is going on.

There are better ways of handling that, and the easier among them is to just use code. That can be… surprisingly versatile.

time to read 3 min | 468 words

VBS.FileI was reminiscing about some old code that I wrote a long while ago, in the heyday of ASP and when the dotcom bubble was just starting to inflate. At the time, I was either still at high school or just graduated and I was fascinated by the ability to write web applications. I wrote quite a few of them, as I recall. Thankfully, none of them ever made it to this day and age. I remember one project in particular that I was quite proud of. I wrote a bunch of BBS / forum systems. One version used an Access file as the database. IIRC, that is literally how I learned SQL for the first time.

The other BBS system is what I’m here to talk about today. You couldn’t always get Access, and having it installed on the server was PITA. Especially given that I was pretty much limited to hosts that offered free hosting only. So I decided to write a BBS system that had no dependencies whatsoever and can be deployed on any host that could handle ASP. Note that this is ASP classic, .NET is still 2 years away from alpha status at this time and Java is for applets.

I decided that I would write everything through file I/O, but that was quite complex. I needed something that would help me out. Then I realized that I could use ASP itself to help me. Instead of having to pull data at runtime from a file, parse it, process it and so on, I could lean on ASP itself for that.

Trigger warning: This code is newly written, but I still remember the shape of it quite well. This may cause you seizures. For the full beauty of this piece of code, you need to consider that this is a very small piece of a much larger codebase (all in a single file, of course) but it is a very much a reprehensive representative example.

I’ll give you a moment to study the code. It deserve that much of your attention.

What you see here is a beautiful example of using code as data and data as code, self modifying code and some really impressive (even if I say so myself) capabilities of my past self to dig himself way into the hole.

Bonus points if you can point out all the myriad of issue that this code has. You can safely leave aside maintainability, I never had to maintain it, but over twenty year have passed, and I still remember the complexity involved in keeping all the states in my head.

And that was the first time that I actually wrote my own dedicated database.

time to read 7 min | 1321 words

imageAlmost by accident, it turned out that I implemented a pretty simple, but non trivial task in both C and Rust and blogged about them.

Now that I’m done with both of them, I thought it would be interesting to talk about the differences in the experiences.

The Rust version clocks at exactly 400 lines of code and uses 12 external crates.

The C version has 911 lines of C code and another 140 lines in headers and depends on libuv and openssl.

Both took about two weeks of evenings of me playing around. If I was working full time on that, I could probably do that in a couple of days (but probably more, to be honest).

The C version was very straightforward. The C language is pretty much not there, and on the one hand, it didn’t get in my way at all. On the other hand, you are left pretty much on your own. I had to write my own error handling code to be sure that I got good errors, for example. I had to copy some string processing routines that aren’t available in the standard library, and I had to always be sure that I’m releasing resources properly. Adding dependencies is something that you do carefully, because it is so painful.

The Rust version, on the other hand, uses the default error handling that Rust has (and much improved since the last time I tried it). I’m pretty sure that I’m getting worse error messages than the C version I used, but that is good enough to get by, so that is fine. I had to do no resource handling. All of that is already handled for me, and that was something that I didn’t even consider until I started doing this comparison.

When writing the C version, I spent a lot of time thinking about the structure of the code, debugging through it (to understand what is going on, since I also learned how OpenSSL work) and seeing if things worked. Writing the code and compiling it were both things that I spent very little time on.

In comparison, the Rust version (although benefiting from the fact that I did it second, so I already knew what I needed to do) made me spend a lot more time on just writing code and getting it to compile.  In both cases, I decided that I wanted this to be a production worthy code, which meant handling all errors, producing good errors, etc. In C, that was simply a tax that needed to be dealt with. With Rust, that was a lot of extra work.

The syntax and language really make it obvious that you want to do that, but in most of the Rust code that I reviewed, there are a lot of unwrap() calls, because trying to handle all errors is too much of a burden. When you aren’t doing that, your code size balloons, but the complexity of the code didn’t, which was a great thing to see.

What was really annoying is that in C, if I got a compiler error, I knew exactly what the problem was, and errors were very localized. In Rust, a compiler error could stymie me for hours, just trying to figure out what I need to do to move forward. Note that the situation is much better than it used to be, because I eventually managed to get there, but it took a lot of time and effort, and I don’t think that I was trying to explore any dark corners of the language.

What really sucked is that Rust, by its nature, does a lot of type inferencing for you. This is great, but this type inferencing goes both backward and forward. So if you have a function and you create a variable using: HashMap::new(), the actual type of the variable depends on the parameters that you pass to the first usage of this instance. That sounds great, and for the first few times, it looked amazing. The problem is that when you have errors, they compound. A mistake in one location means that Rust has no information about other parts of your code, so it generates errors about that. It was pretty common to make a change, run cargo check and see three of four screen’s worth of errors pass by, and then go into a “let’s fix the next compiler error” for a while.

The type inferencing bit also come into play when you write the code, because you don’t have the types in front of you (and because Rust love composing types) it can be really hard to understand what a particular method will return.

C’s lack of async/await meant that when I wanted to do async operations, I had to decompose that to event loop mode. In Rust, I ended up using tokio, but I think that was a mistake. I should have used the event loop model there as well. It isn’t as nice, in terms of the code readability, but the fact that Rust doesn’t have proper async/await meant that I had a lot more additional complexity to deal with, and that nearly caused me to give up on the whole thing.

I do want to mention that for C, I had run Valgrind a few times to get memory leaks and invalid memory accesses (it found a few, even when I was extra careful). In Rust, the compiler was very strict and several times complained about stuff that if allowed, would have caused problems. I did liked that, but most of the time, it felt like fighting the compiler.

Speaking of which, the compilation times for Rust felt really high. Even with 400 lines of code, it can take a couple of seconds to compile (with cargo check, mind, not full build). I do wonder what it will do with a project of significant size.

I gotta say, though, compiling the C code meant that I would have to test the code. Compiling the Rust code meant that I could run things and they usually worked. That was nice, but at the same time, getting the thing to compile at all was a major chore many times. Even with the C code not working properly, the feedback loop was much shorter with C than with Rust. And some part of that was that I already had a working implementation for most of what I needed, so I had a lot less need to explore when I wrote the Rust code.

I don’t have any hard conclusions from the experience, I like the simplicity of C, and if I had something like Go’s defer to ensure resource disposal, that would probably be enough (I’m aware of libdefer and friends). I find the Rust code elegant (except the async stuff) and the standard library is great. The fact that the crates system is there means that I have very rich access to additional libraries and that this is easy to do. However, Rust is full of ceremony that sometimes seems really annoying. You have to use cargo.toml and extern crate for example.

There is a lot more to be done to make the compiler happy. And while it does catch you sometimes doing something your shouldn’t, I found that it usually felt like busy work more than anything else. In some ways, it feels like Rust is trying to do too much. I would have like to see something less ambitious. Just focusing on one or two concepts, instead of trying to be high and low level language, type inference set to the make, borrow checker and memory safety, etc. It feels like this is a very high bar to cross, and I haven’t seen that the benefits are clearly on the plus side here.

time to read 2 min | 345 words

I’m pretty much done with my Rust protocol impl. The last thing that I wanted to try was to see how it would look like when I allow for messages to be handled out of band.

Right now, my code consuming the protocol library looks like this:

This is pretty simple, but note that the function definition forces us to return a value immediately, and that we don’t have a way to handle a command asynchronously.

What I wanted to do is to change things around so I could do that. I decided to implemented the command:

remind 15 Nap

Which should help me remember to nap. In order to handle this scenario, I need to provide a way to do async work and to keep sending messages to the client. Here was the first change I made:

image

Instead of returning a value from the function, we are going to give it the sender (which will render the value to the client) and can return an error if the command is invalid in some form.

That said, it means that the echo implementation is a bit more complex.

There is… a lot of ceremony here, even for something so small. Let’s see what happens when we do something bigger, shall we? Here is the implementation of the reminder handler:

Admittedly, a lot of that is error handling, but there is a lot of code here to do something that simple.  Compare that to something like C#, where the same thing could be written as:

I’m not sure that the amount of complexity that is brought about by the tokio model, even with the async/await macros is worth it at this point. I believe that it needs at least a few more iterations before it is going to be usable for the general public.

There is way too much ceremony and work to be done, and a single miss and you are faced with a very pissed off compiler.

time to read 3 min | 410 words

After a lot of trouble, I’m really happy that I was able to build an async I/O implementation of my protocol. However, for real code, I think that I would probably recommend using with the sync API instead, since at least that is straightforward and doesn’t incur so much overhead at development time. The async stuff is still very much a “use at your own risk” kind of deal from my perspective. And I can’t imagine trying to use it in a large project and no suffering from the complexity.

As a good example, take a look at the following bit of code:

image

It doesn’t seem to be doing much, right? And it is clear what the intent of the code is.

However, if you try to compile this code you’ll get:

image

Now, it took me a long while to figure out what is going on.  The issue is that the code I’m seeing isn’t the actual code, because of macro expansions.

So let’s resolve this and see what the expanded code looks like:

This is after formatting, of course, but it certainly looks scary. Glancing at this code doesn’t tell me what the problem was, so I tried replacing the method with the expanded result, and I got the same error, but this time I got it on a line that helped me figure it out. Here is the issue:

image

We use the ? to return early from the poll method, and the Receiver I’m using in this case is defined to have a Result<String, ()>, so this is the cause of the problem.

I returned my own error type as a result, giving me the ability to convert from (), but that was a really hard thing to resolve.

It might be better to have Rust also offer to show the error on the expanded code by default, because it was somewhat of a chore to actually get to this.

What made this oh so confusing is that I had the exact same code, but using a Stream<String, io:Error> that worked, obviously. But it was decidedly non obvious to see what was the difference between two identical pieces of code.

time to read 3 min | 588 words

On my last post, I got really frustrated with tokio’s complexity and wanted to move to use mio directly. The advantages are that the programming model is pretty simple, even if actually working with is is hard. Event loops can cause your logic to spread over many different locations and make it hard to follow. I started to go that path until I figure out just how much work it would take. I decided to give tokio a second change, and at this point, I looked into attempts to provide async/await functionality to Rust.

It seems that at least some work is already available for this, using futures + some Rust macros. That let me write code that is much more natural looking, and I actually managed to make it work.

Before I get to the code, I want to point out some concerns that I have right now. The futures-await crate (and indeed, all of tokio) seems to be in a state of flux. There is an await in tokio, and I think that there is some merging around of all of those libraries into a single whole. What I don’t know, and can’t find any information about, is what I should actually be using, and how all the pieces come together. I have to note that even with async/await, the programming model is still somewhat awkward, but it is at a level that I can live with. Here is how I built it.

First, we need to accept connections, which is done like so:

Note that I have two #[async[ annotations. One for the method as a whole and one for the for loop. This just accept the connection and spawn a task to handle that, the most interesting tidbits are in the actual processing of the connection:

You can see that this is fairly straightforward code. We first do the TLS handshake, then we validate the certificate. If there is an auth error, we send it to the user and back off. If we are successful, however, things get interesting.

I create a channel, which allow me to  split off the read and write portions of the task. This means that I can send results out of order, if I wanted to, which is great for the actual protocol handling. The first thing to do is to send the OK string to the client, so they know that we successfully connected, then we spawn the read/write tasks. The write task is pretty simple, overall:

You can see the funny .0 references, which is an artifact of the fact that the write_all() function consumes the writer we pass to it and return (a potentially different) writer in the result.  This is pretty common for functional languages.

I’m pretty sure that I can avoid the two calls to write_all for the postfix, but that is easier for now.

Processing the commands is simple as well:

For each command we support, we have an entry on the server configuration and we fetch and invoke it. The result of the command will be written to the client by the write task. Right now we have a 1:1 association between them, but this is now easily broken.

And finally, having an actually command run and running the server itself:

This is pretty simple now, and it give us a nice model to program commands and responses.

I pushed the whole code to this branch, if you care to look at it.

I have some more comments about this code, but I’ll reserve them for another post.

time to read 2 min | 346 words

I kept going with tokio for a while, I even got something that I think would eventually work. The whole concept is around streams, so I create a way to generate them. This is basically taking this code and making it async.

I gave up well into the second hour. Here is where I stopped:

image

I gave up when I realized that the reader I’m using (which is SslStream) didn’t seem to have poll_read. The way I’m reading the code, it is supposed to, but I just threw up my hands at disgust at this time. If it this hard, it ain’t going to happen.

I wrote significant amount of async code in C# at the time when events and callbacks were the only option and then when the TPL and ContinueWith was the way to go. That was hard, and async/await is a welcome relief, but the level of frustration and “is this wrong, or am I really this stupid?” that I got midway through is far too much.

Note that this isn’t even about Rust. Some number of issues that I run into were because of Rust, but the major issue that I have here is that I’m trying to write a function that can be expressed in a sync manner in less than 15 lines of code and took me about 10 minutes to write the first time. And after spending more hours than I’m comfortable admitting, I couldn’t get it to work. The programming model you have here, even if everything did work, means that you have to either decompose your behavior to streams and interact with them in this manner or you put everything as nested lambdas.

Either option doesn’t make for a nice model to work with. I believe that there is another async I/O model for Rust, the MIO crate, which is based on the event loop model. I’ve already implemented that in C, so that might be a more natural way to do things.

time to read 5 min | 941 words

Now that we have a secured and authentication connection, the next stage in making a proper library is to make it run more than a single connection at time. I could have use a thread per connection, of course, or even use a thread pool, but neither of those options is valid for the kind of work that I want to see, so I’m going to jump directly into async I/O in Rust and see how that goes.

The sad thing about this is that I expect that this will make me lose some / all of the nice API that I get for OpenSSL in the sync mode.

Async in Rust is handled by a crate called tokio, and there seems to be active work to bring async/await to the language itself. In the meantime, we have to make do with the usual facilities, which ought to make this interesting.

It actually looks like there is a crate that gives pretty nice handling of tokio async I/O and OpenSSL so that is encouraging. However, as part of trying to re-write everything in tokio style, I got the compiler very upset with me. Here is (partial) error message:

image

Last time I had to parse such errors, I was working in C++ templated code and the year was 1999.

And here is the piece of code it so dislikes:

image

I googled around and there is this detailed answer on a similar topic that frankly, frightened me. I shouldn’t have to dig this deeply and have to start drawing diagrams on so many disparate pieces of the code just to figure out a compiler error.

Let’s try to break it to its component parts and see if that make sense, I reduce the code in question to just:

image

Got another big scary error message. Okay, let’s try it without the OpenSSL stuff?

image

This produce the same error, but in a much less scary tone:

image

Okay, now this looks about as simple as it can be. And now the fix is pretty obvious:

image

The key to understand here, I believe (I haven’t tested it yet) that the write_all call will either perform its work or schedule it, so any future work based on it should go in a nested and_then call. So the result of the single for_each invocation is not the direct continuation of the previous call.

That is fine, I’ll deal with that, I guess.

Cue here about six hours of programming montage.

I have been programming over 20 years, I like to think that I have been around the block a few times. And the simple task of reading a message from TCP using async I/O took me far too long. Here is what I eventually ended up with:

image

This is after fighting with the borrow checker (a lot, it ended up winning), trying to grok my head around the model that tokio has. It is like they took the worst parts of async programming, married it to stream programming’s ugly second cousin and then decided to see if any of the wedding guests is open for adoption.

And if the last sentence doesn’t make sense to you, you are welcome, that is how I felt at certain points. Here is one of the errors that I run into:

image

What is this string, where did it come from and why do we have a unit “()” there? Let me see if I can explain what is going on here. Here is a very simple bit of code that would explain things.

image

And here is the error it generates:

image

The problem is that spawn is expecting a future that results a result that has no meaning, something like: Future<Result<(), ()>>. This make sense, since there isn’t really anything that it can do with whatever the result is. But the error can be really confusing. I spent a lot of time trying to actually parse this, then I had to go and check the signatures of the method involved, and then I had to reconstruct what are the generic parameters that are required, etc.

The fix, btw, is this:

image

Ask yourself how long it would take you to figure what the changes between these versions of the code are without the marker.

Anyway, although I’m happy that I got something done, this approach is really not sustainable. I’m pretty sure that I’m either doing something wrong or missing something. It shouldn’t be this hard. I got some ideas that I want to try, which I’ll talk about in the next post.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 4.2 Features (4):
    19 Mar 2019 - Time travel and revisions revert
  2. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  3. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
  4. Production postmortem (25):
    18 Feb 2019 - This data corruption bug requires 3 simultaneous race conditions
  5. Making money from Open Source Software (3):
    08 Feb 2019 - How we do it?
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats