Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,784 | Comments: 48,886

filter by tags archive
time to read 6 min | 1043 words

imageI had some really interesting discussions while I was in CodeMash, and a few of them touched on modeling concerns with non trivial architectures. In particular, I was asked about my opinion on the role of OR/M in systems that mostly do CQRS, event processing, etc.

This is a deep question, because on first glance, your requirements from the database are pretty much just:

INSERT INTO Events(EventId, AggregateId, Time, EventJson) VALUE (…)

There isn’t really the need to do anything more interesting than that. The other size of that is a set of processes that operate on top of these event streams and produce read models that are very simple to consume as well. There isn’t any complexity in the data architecture at all, and joy to world, etc, etc.

This is true, to an extent. But this is only because you have moved a critical component of your system, the beating heart of your business. The logic, the rules, the thing that make a system more than just a dumb repository of strings and numbers.

But first, let me make sure that we are on roughly the same page. In such a system, we have:

  • Commands – that cannot return a value (but will synchronously fail if invalid). These mutate the state of the system in some manner.
  • Events – represent something that has (already) happened. Cannot be rejected by the system, even if they represent invalid state. The state of the system can be completely rebuilt from replaying these events.
  • Queries – that cannot mutate the state

I’m mixing here two separate architectures, Command Query Responsibility Separation and Event Sourcing. They aren’t the same, but they often go together hand in hand, and it make sense to talk about them together.

And because it is always easier for me to talk in concrete, rather than abstract, terms, I want to discuss a system I worked on over a decade ago. That system was basically a clinic management system, and the part that I want to talk about today was the staff scheduling option.

Scheduling shifts is a huge deal, even before we get to the part where it directly impacts how much money you get at the end of the month. There are a lot of rules, regulations, union contracts, agreement and bunch of other staff that relate to it. So this is a pretty complex area, and when you approach it, you need to do so with the due consideration that it deserves. When we want to apply CQRS/ES to it, we can consider the following factors:

The aggregates that we have are:

  • The open scheduled for two months for now. This is mutable, being worked on by the head nurse and constantly changes.
  • The proposed scheduled for next month. This one is closed, changes only rarely and usually because of big stuff (something being fired, etc).
  • The planned schedule for the current month, frozen, cannot be changed.
  • The actual schedule for the current month. This is changed if someone doesn’t show to their shift, is sick, etc.

You can think of the first three as various stages of a PlannedScheduled, but the ActualSchedule is something different entirely. There are rules around how much divergence you can have between the planned and actual schedules, which impact compensation for the people involved, for example.

Speaking of which, we haven’t yet talked about:

  • Nurses / doctors / staff – which are being assigned to shifts.
  • Clinics – a nurse may work in several different locations at different times.

There is a lot of other stuff that I’m ignoring here, because it would complicate the picture even further, but that is enough for now. For example, regardless of the shifts that a person was assigned to and showed up, they may have worked more hours (had to come to a meeting, drove to a client) and that complicated payroll, but that doesn’t matter for the scheduling.

I want to focus on two actions in this domain. First, the act of the head nurse scheduling a staff member to a particular shift. And second, the ClockedOut event which happens when a staff member completes a shift.

The ScheduleAt command place a nurse at a given shift in the schedule, which seems fairly simple on its face. However, the act of processing the command is actually really complex. Here are some of the things that you have to do:

  • Ensure that this nurse isn’t schedule to another shift, either concurrently or too close to another shift in a different address.
  • Ensure that the nurse doesn’t work with X (because issues).
  • Ensure that the role the nurse has matches the required parameters for the schedule.
  • Ensure that the number of double shifts in a time period is limited.

The last one, in particular, is a sinkhole of time. Because at the same time, another business rule says that we must give each nurse N number of shifts in a time period, and yet another dictates how to deal with competing preferences, etc.

So at this point, we have: ScheduleAtCommand.Execute() and we need to apply logic, complex, changing, business critical logic.

And at this point, for that particular part of the system, I want to have a full domain, abstracted persistence and be able to just put my head down and focus on solving the business problem.

The same applies for the ClockedOut event. Part of processing it means that we have to look at the nurse’s employment contract, count the amount of overtime worked, compute total number of hours worked in a pay period, etc. Apply rules from the clinic to the time worked, apply clauses from the employment contract to the work, etc. Again, this gets very complex very fast. For example, if you have a shift from 10PM – 6 AM, how do you compute overtime? For that matter, if this is on the last day of the month, when do you compute overtime? And what pay period do you apply it to?

Here, too, I want to have a fully fleshed out model, which can operate in the problem space freely.

In other words, a CQRS/ES architecture is going to have the domain model (and some sort of OR/M) in the middle, doing the most interesting things and tackling the heart o complexity.

time to read 3 min | 478 words

After running into a few hurdles, I managed to get rust openssl bindings to work, which means that this is now the time to actually wire things properly in my network protocol, let’s see how that works, shall we?

First, we have the OpenSSL setup:

As you can see, this is pretty easy and there isn’t really anything there that is of actual interest. It does feel a whole lot easier than dealing with OpenSSL directly in C, though.

That said, when I started actually dealing with the client certificate, things got a lot more complicated. The first thing that I wanted to do is to do my authentication, which is defined as:

  • Client present a client certificate (can be any client certificate).
  • If a client doesn’t give a certificate, we accept the connection, send a message (using the encrypted tunnel) and abort.
  • If the client provide an certificate, it must be one that was previously registered in the server. That is what allowed_certs_thumbprints is for. If it isn’t, we accept the connection, write an error and abort.
  • If the client certificate has expired or is not yet valid, accept, write error & abort.

You get the gist. Here is what I had to do to implement the first part:

Most of the code, actually, is about generating proper and clear error messages, more than anything else. I’m not sure how to get the friendly name from the certificate, but this seems to be a good enough stand-in for now.

We validate that we have a certificate, or send an error back. We validate that the certificate presented is known to us, or we send an error back.

The next part I wanted to implement was… really far too hard than it should be. I just wanted to verify that the certificate not before/not after dates are valid. And the problem is that the rust bindings for OpenSSL do not expose that information. Luckily, because it is using OpenSSL, I can just call to OpenSSL directly. That led me to some interesting search into how Rust calls out to C, how foreign types work and a lot of “fun” like that. Given that I’m doing this to learn, I suppose that this is a good thing, though.

Here is what I ended up with (take a deep breath):

Notice that I’m doing all of this (defining external function, defining helper functions) inside the authenticate_certificate function. Coming up with that was harder than expected, but I really liked the fact that it was possible, and that I can just shove this into a corner of my code and not have to make a Big Thing out of it.

And with that, I the authentication portion of my network protocol in Rust done.

The next stage is going to be implementing a server that can handle more than a single connection at a time Smile.

time to read 4 min | 654 words

I’m writing this post as I’m sitting in the airport, leaving the CodeMash conference. This has been my first “true” conference for a while, because I was able to not just speak and stand in a sponsor booth but actually participate in a lot of sessions and talk to a lot of people. I had a blast, and both the IRS and my wife consider this a work trip Smile.

I have been presenting in international conferences for over a decade now and I wanted to put in a few words for people who are speaking at a technical conference. None of this is new, mind. If you have been reading any recommendations about how to present in conferences, I’m probably going to bore you. I’m writing this because I saw several sessions that had technical issues in how they were delivered. That is, the content was great, but the way it was delivered could be improved.

Probably the most important factor that I need to mention is: Make your content readable.

When you are presenting, use big fonts everywhere. That means that (ahead of time) you should make sure that your PPT’s content is readable from the end of the room, if you are presenting code in an IDE, make sure that you know how to increase the zoom so the code is readable. For that matter, syntax highlighting is not an optional feature when showing code.  And use the default syntax highlighting for the language you are working with. And no dark themes.

I’m assuming that what you care about is the code, so you want to show that, in a way that the people in your talk are familiar with, so no new themes, aqua on pink color schemes, etc. Go with the default.

But code is just one factor of it. If you are showing output, make sure that this is readable. That means that if you are writing to the console, make sure that the console font it big enough, use colors for emphasis, etc.

If you are showing XML / JSON / data formats, make sure that they are pretty printed. If you dump something like this:

image

The audience is going to be too busy parsing the text to actually pay attention to what you are saying.

And if at all possible, use a console application and have no architecture.

If you are actually talking about a React app, you can’t do that, and if your talk is about architecture, you are going to need to show that. But for most cases, if you are showing off some new language feature, or talk about a particular service or API, you want to use a console app to demo it. This is because it allow you (and your audience) to focus solely on the problem at hand rather than look the the control –> service –> repository magic via DI that hides a lot of the backend details.  In general, you want to strip away as much as possible that isn’t directly core to your topic.

Another thing that I noticed, when you want to show additional data (files, artifacts, etc), make sure that you have them already opened before you start.  If you are doing a demo, have screen shots available for when / if the demo mess up.

Everything I just mentioned might seem obvious, but you need to go over that before you go and present, it was surprising to see several speakers make some of these mistakes.

Now, to be fair, that might sound like I’m piling on people, but that isn’t my intention. I’m aggregating a lot of different small mistakes to point them out, they can detract from the presentation, but at least in the sessions that I was at, they didn’t kill the presentation for me.

time to read 2 min | 340 words

After getting really frustrated with the state of Rust & TLS, and I decided to sit down and figure out what it would take to make the OpenSSL crate actually build successfully. Even though the crate claims to support vcpkg, it seems that there were issues there. I started from a clean slate, and checked that I have openssl via vcpkg installed:

image

I then got into a rabbit hole of errors in the build, first:

image

This seems like it wants to statically link to them by default, but when I set the env variable, I got:

image

Looking closely at the error (always read the error message), you can see that it is looking for a 64 bits build, but I’ve a x86 build.

That very likely explains the issues that I previously had. I tried to point it to the SSL build directory, and I’m pretty sure that I used the 32bits directory. It rejected the attempted link, but didn’t bother to tell me about it.

To be fair, this isn’t Rust’s fault, it is link.exe’s fault for not providing a clear error about this case. Actually, this is the case where you are going to invest some time writing a feature whose only purpose is to get good errors when the user messed up. But that kind of attention to detail make a world of difference.

Here is what fixed this for me.

image

And with that, I can build using:

image

Hurray! That is enough for now, I guess. I’ll get things actually working in another post.

time to read 3 min | 505 words

Computation during indexes open up some nice  features when we are talking about data modeling and working with your data. In this post, I want to discuss predicting the future with it. Let’s see how we can do that, shall we?

Consider the following document, representing a (simplified) customer model:

image

We have a customer that is making monthly payments. This is a pretty straightforward model, right?

We can do a lot with this kind of data. We can obviously compute the lifetime value of a customer, based on how much they paid us. We already did something very similar in a previous post, so that isn’t very interesting.

What is interesting is looking into the future. Let’s see how we can start simple, but figuring out what is the next charge rate for this customer. For now, the logic is about as simple as it can be. Monthly customers pay by month, basically. Here is the index:

image

I’m using Linq instead of JS here because I’m dealing with dates and JS support for dates is… poor.

As you can see, we are simply looking at the last date and the subscription, figuring out how much we paid the last three times and use that as the expected next payment amount. That can allow us to do nice things, obviously. We can now do queries on the future. So finding out how many customers will (probably) pay us more than 100$ on the 1st of Feb both easy and cheap.

We can actually take this further, though. Instead of using a simple index, we can use a map/reduce one. Here is what this looks like:

image

And the reduce:

image

This may seem a bit dense at first, so let’s de-cypher it, shall we?

We take the last payment date and compute the average of the last three payments, just as we did before. The fun part now is that we don’t compute just the single next payment, but the next three. We then output all the payments, both existing (that already happened) and projected (that will happen in the future) from the map function. The reduce function is a lot simpler, and simply sum up the amounts per month.

This allows us to effectively project data into the future, and this map reduce index can be used to calculate expected income. Note that this is aggregated across all customers, so we can get a pretty good picture of what is going to happen.

A real system would probably have some uncertainty factor, but that touches on business strategy more than modeling, so I don’t think we need to go into that here.

time to read 2 min | 274 words

After trying (and failing) to use rustls to handle client authentication, I tried to use rust-openssl bindings. It crapped out on me with a really scary link error. I spent some time trying to figure out what was going on, but given that it said that I wanted to write Rust code, not deal with link errors, I decided to see if the final alternative in the Rust eco system will work, native-tls package.

And… that is a no go as well. Which is sad, because the actual API was quite nice. The reason it isn’t going to work? The native-tls package just has no support for client certificate authentication when running as a server, so not usable for me.

That leaves me with strike three out of three:

  • rustls – native Rust API, easy to work with, but doesn’t allow to accept arbitrary client certificates, only ones from known issuers.
  • rust-openssl – I have build this on top of OpenSSL before, so I know it works. However, trying to build it on Windows resulted in link errors, so that was out.
  • native-tls – doesn’t have support for client certificates, so not usable.

I think that at this point, I have three paths available to me:

  • Give up and maybe try doing something else with Rust.
  • Fork rustls and add support for accepting arbitrary client certificates. I’m not happy with this because it requires changing not just rustls but also probably webpki package and I’m unsure if the changes I have in mind will not hurt the security of the system.
  • Try to fix the OpneSSL link issue.

I think that I’ll go with the third option, but this is really annoying.

time to read 4 min | 613 words

imageIn my last post on the topic, I showed how we can define a simple computation during the indexing process. That was easy enough, for sure, but it turns out that there are quite a few use cases for this feature that go quite far from what you would expect. For example, we can use this feature as part of defining and working with business rules in our domain.

For example, let’s say that we have some logic that determine whatever a product is offered with a warranty (and for how long that warranty is valid). This is an important piece of information, obviously, but it is the kind of thing that changes on a fairly regular basis. For example, consider the following feature description:

As a user, I want to be able to see the offered warranty on the products, as well as to filter searches based on the warranty status.

Warranty rules are:

  • For new products made in house, full warranty for 24 months.
  • For new products from 3rd parties, parts only warranty for 6 months.
  • Refurbished products by us, full warranty, for half of new warranty duration.
  • Refurbished 3rd parties products, parts only warranty, 3 months.
  • Used products, parts only, 1 month.

Just from reading the description, you can see that this is a business rule, which means that it is subject to many changes over time. We can obviously create a couple of fields on the document to hold the warranty information, but that means that whenever the warranty rules change, we’ll have to go through all of them again. We’ll also need to ensure that any business logic that touches the document will re-run the logic to apply the warranty computation (to be fair, these sort of things are usually done as a subscription in RavenDB, which alleviate that need).

Without further ado, here is the index to implement the logic above:

You can now query over the warranty types and it’s duration, project them from the index, etc. Whenever a document is updates, we’ll re-compute the warranty status and update the index.

This saves you from having additional fields in your model and greatly diminish the cost of queries that need to filter on warranty or its duration (since you don’t need to do this computation during the query, only once, during indexing).

If the business rule definition changes, you can update the index definition and RavenDB will effectively roll out your change to the entire dataset. That is nice, but even though I’m writing about cool RavenDB features, there are some words of cautions that I want to mention.

Putting queryable business rules in the database can greatly ease your life, but be wary of putting too much business logic in there. In general, you want your business logic to reside right next to the rest of your application code, not running in a different server in a mode that is much harder to debug, version and diagnose. And if the level of complexity involved in the business rule exceed some level (hard to define, but easy to know when you hit it), you should probably move from defining the business rules in an index to a subscription.

A RavenDB subscription allow you to get all changes to documents and apply your own logic in response. This is a reliable way to process data in RavenDB, this runs in your own code, under your own terms, so it can enjoy all the usual benefits of… well, being your code, and not mine. You can read more about them in this post and of course, the documentation.

time to read 1 min | 182 words

I run into this very interesting blog post and I decided to see if I could implement this on my own, without requiring any runtime support. This turned out the be surprisingly easy, if you are willing to accept some caveats.

I’m going to assume that you have read the linked blog post, and here is the code that implement it:

Here is what it gives you:

  • You can have any number of queues.
  • You can associate an instance with a queue at any time, including far after it was constructed.
  • Can unregister from the queue at any time.
  • Can wait (or easily change it to do async awaits, of course) for updates about phantom references.
  • Can process such events in parallel.

What about the caveats?

This utilize the finalizer internally, to inform us when the associated value has been forgotten, so it takes longer than one would wish for. This implementation relies on ConditionalWeakTable to do its work, by creating a weak associating between the instances you pass and the PhantomReference class holding the handle that we’ll send to you once that value has been forgotten.

time to read 4 min | 734 words

imageThe title of this post is pretty strange, I admit. Usually, when we think about modeling, we think about our data. If it is a relational database, this mostly mean the structure of your tables and the relations between them. When using a document database, this means the shape of your documents. But in both cases, indexes are there merely to speed things up. Oh, a particular important query may need an index, and that may impact how you lay out the data, but these are relatively rare cases. In relational databases and most non relational ones, indexes do not play any major role in data modeling decisions.

This isn’t the case for RavenDB. In RavenDB, an index doesn’t exist merely to organize the data in a way that make it easier for the database to search for it. An index is actually able to modify and transform the data, on the current document or full related data from related documents. A map/reduce index is even able aggregate data from multiple documents as part of the indexing process. I’ll touch on the last one in more depth later in this series, first, let’s tackle the more obvious parts. Because I want to show off some of the new features, I’m going to use JS for most of the indexes definitions in these psots, but you can do the same using Linq / C# as well, obviously.

When brain storming for this post, I got so many good ideas about the kind of non obvious things that you can do with RavenDB’s indexes that a single post has transformed into a series and I got two pages of notes to go through. Almost all of those ideas are basically some form of computation during indexing, but applied in novel manners to give you a lot of flexibility and power.

RavenDB prefers to have more work to do during indexing (which is batched and happen on the background) than during query time. This means that we can push a lot more work into the background and just let RavenDB handle it for us. Let’s start from what is probably the most basic example of computation during query, the Order’s Total. Consider the following document:

image

As you can see, we have the Order document and the list of the line items in this order. What we don’t have here is the total order value.

Now, actually computing how much you are going to pay for an order is complex. You have to take into account taxation, promotions, discounts, shipping costs, etc. That isn’t something that you can do trivially, but it does serve to make an excellent simple example and similar requirements exists in many fields.

We can obvious add an Total field to the order, but then we have to make sure that we update it whenever we update the order. This is possible, but if we have multiple clients consuming the data, this can be fairly hard to do. Instead, we can place the logic to compute the property in the index itself. Here is how it would look like:

image

The same index in JavaScript is almost identical:

In this case, they are very similar, but as the complexity grow, I find it is usually easier to express logic as a JavaScript index rather than use a single (complex) Linq expression.

Such an index give us a computed field, Total, that has the total value of an order. We can query on this field, sort it and even project it back (if this field is stored). It allow us to always have the most up to date value and have the database take care of computing it.

This basic technique can be applied in many different ways and affect the way we shape and model our data. Currently I have at least three more posts planned for this series, and I would love to hear your feedback. Both on the kind of stuff you would like me to talk about and the kind of indexes you are using RavenDB and how it impacted your data modeling.

time to read 6 min | 1017 words

The task that I have for now is to add client authentication via X509 client certificate. That is both obvious and non obvious, unfortunately. I’ll get to that, but before I do so, I want to go back to the previous post and discuss this piece of code:

I’ll admit that I’m enjoying exploring Rust features, so I don’t know how idiomatic this code is, but it is certainly dense. This basically does the setup for a TCP listener and setting up of the TLS details so we can accept a connection.

Rust allows us to define local functions (inside a parent function), this is mostly just a way to define a private function, since the nested function has no access to the parent scope. The open_cert_file function is just a way to avoid code duplication, but it is an interesting one. It is a generic function that accepts an open ended function of its own. Basically, it will open a file, read it and then pass it to the function it was provided. There is some error handling, but that is pretty much it.

The next fun part happens when we want to read the certs and key file. The certs file is easy, it can only ever appear in a single format, but the key may be either PKCS8 or RSA Private Key. And unlike the certs, where we expect to get a collection, we need to get just a single value. To handle that we have:

image

First, we try to open and read the file as a RSA Private Key, if that isn’t successful, we’ll attempt to read it as PKCS8 file. If either of those attempts was successful, we’ll try to get the first key, clone it and return.  However, if there was an error in any part of the process, we abort the whole thing (and exit the function with an error).

From my review of Rust code, it looks like this isn’t non idiomatic code, although I’m not sure I would call it idiomatic at this point.  The problem with this code is that it is pretty fun to write, when you read it is obvious what is going on, but it is really hard to actually debug this. There is too much going on in too little space and it is not easy to follow in a debugger.

The rest of the code is boring, so I’m going to skip that and start talking about why client authentication is going to be interesting. Here is the core of the problem:

image

In order to simplify my life, I’m using the rustls’ Stream to handle transparent encryption and decryption. This is similar to how I would do it when using C#, for example. However, the stream interface doesn’t have any way for me to handle this explicitly. Luckily, I was able to dive into the code and I think that given the architecture present, I can invoke the handshake manually on the ServerSession and then hand off the session as is to the stream.

What I actually had to do was to setup client authentication here:

image

And then manually complete the handshake first:

image

And this is when I run into a problem, when trying to connect via my a client certificate, I got the following error:

image

I’m assuming that this is because rustls is actually verifying the certificate against PKI, which is not something that I want. I don’t like to use PKI for this, instead, I want to register the allowed certificates thumbprints, but first I need to figure out how to make rustls accept any kind of client certificate. I’m afraid that this means that I have to break out the debugger again and dive into the code to figure out where we are being rejected and why…

After a short travel in the code, I got to something that looks promising:

image

This looks like something that I should be able to control to see if I like or dislike the certificate. Going inside it, it looks like I was right:

image

I think that I should be able to write an implementation of this that would do the validation without checking for the issuer. However, it looks like my plan run into a snag, see:

image

I’m not sure that I’m a good person to talk about the details of X509 certificate validation. In this case, I think that I could have done enough to validate that the cert is valid enough for my needs, but it seems like there isn’t an way to actually provide another implementation of the ClientCertVerifier, because the entire package is private. I guess that this is as far as I can use rustls, I’m going to move to the OpenSSL binding, which I’m more familiar with and see how that works for me.

Okay, I tried using the rust OpenSSL bindings, and here is what I got:

image

So this is some sort of link error, and I could spend half a day to try to resolve it, or just give up on this for now. Looking around, it looks like there is also something called native-tls for Rust, so I might take a peek at it tomorrow.

FUTURE POSTS

  1. Using TLS in Rust: Going to async I/O with Tokio - 3 days from now
  2. Investigating self inflicted wounds: The SSL failure on the Linux build server - 4 days from now
  3. Using TLS in Rust: tokio ain’t mere mortals - 5 days from now
  4. Pesky code review comments - 6 days from now
  5. Using TLS in Rust: Getting async I/O with tokio, second try - 7 days from now

And 3 more posts are pending...

There are posts all the way to Feb 04, 2019

RECENT SERIES

  1. Using TLS with Rust (4):
    17 Jan 2019 - Authentication
  2. Data modeling with indexes (3):
    14 Jan 2019 - Predicting the future
  3. Reminder (9):
    03 Jan 2019 - I’ll be in CodeMash is next week
  4. Production postmortem (24):
    25 Dec 2018 - Handled errors and the curse of recursive error handling
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats