Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,812 | Comments: 49,040

filter by tags archive
time to read 3 min | 405 words

The most common network topology for RavenDB replication is a full mesh. For example, if you have three nodes in your cluster and a database that reside on all three nodes, you’ll have a replication topology that will look like this:

image

This works great when the number of nodes that you have in your cluster is reasonably small. However, we recently got a customer question about a different kind of topology. They have a bunch of nodes, in the order of a few dozens, which cooperate to perform some non trivial task. A key part of this is that the nodes are transient and identical. So a new node may pop up, live for a while (days, weeks, months) and then go away. At any given time you might have a few dozen nodes. That kind of environment won’t really work with a full mesh topology. If we would try, it would look something like that (fully connected network with 40 nodes):

image

This has a total of 780 connections(!) in it.  You can create a topology like that, but a lot of the processing power in the network is going to be dedicated to just maintaining these connections. And you don’t actually need it. RavenDB’s replication algorithm is actually a gossip algorithm, and as you grow the number of nodes that take part in the replication, the less connection you need between nodes. In this case, we can take each of the live nodes and connect each of them to four other (random) nodes. The result would look like so:

image

Remember, each of the nodes is actually connected to a random four other nodes. RavenDB’s replication will ensure that a change to any document in any of the nodes under these conditions will propagate to all the other nodes efficiently.

This approach will also transparently handle any intermediary failures and be robust for nodes coming and leaving on the fly. RavenDB doesn’t implement gossip membership, mostly because that is very heavily dependent on the application and deployment pattern, but once you tell a node who its neighbors are, everything will proceed on its own.

time to read 4 min | 655 words

This post really annoyed me. Feel free to go ahead and go through it, I’ll wait. The gist of the post, titled: “WAL usage looks broken in modern Time Series Databases?” is that time series dbs that uses a Write Ahead Log system are broken, and that their system, which isn’t using a WAL (but uses Log-Structure-Merge, LSM) is also broken, but no more than the rest of the pack.

This post annoyed me greatly. I’m building databases for a living, and for over a decade or so, I have been focused primarily with building a distributed, transactional (ACID), database. A key part of that is actually knowing what is going on in the hardware beneath my software and how to best utilize that. This post was annoying, because it make quite a few really bad assumptions, and then build upon them. I particularly disliked the outright dismissal of direct I/O, mostly because they seem to be doing that on very partial information.

I’m not familiar with Prometheus, but doing fsync() every two hours basically means that it isn’t on the same plane of existence as far as ACID and transactions are concerned. Cassandra is usually deployed in cases where you either don’t care about some data loss or if you do, you use multiple replicas and rely on that. So I’m not going to touch that one as well.

InfluxDB is doing the proper thing and doing fsync after each write. Because fsync is slow, they very reasonable recommend batching writes. I consider this to be something that the database should do, but I do see where they are coming from.

Postgres, on the other hand, I’m quite familiar with, and the description on the post is inaccurate. You can configure Postgres to behave in this manner, but you shouldn’t, if you care about your data. Usually, when using Postgres, you’ll not get a confirmation on your writes until the data has been safely stored on the disk (after some variant of fsync was called).

What really got me annoyed was the repeated insistence of “data loss or corruption”, which shows a remarkable lack of understanding of how WAL actually works. Because of the very nature of WAL, the people who build them all have to consider the nature of a partial WAL write, and there are mechanisms in place to handle it (usually by considering this particular transaction as invalid and rolling it back).

The solution proposed in the post is to use SSTable (sorted strings table), which is usually a component in LSM systems. Basically, buffer the data in memory (they use 1 second intervals to write it to disk) and then write it in one go. I’ll note that they make no mention of actually writing to disk safely. So no direct I/O or calls to fsync. In other words, a system crash may leave you a lot worse off than merely 1 second of lost data.  In fact, it is possible that you’ll have some data there, and some not. Not necessarily in the order of arrival.

A proper database engine will:

  • Merge multiple concurrent writes into a single disk operation. In this way, we can handle > 100,000 separate writes per seconds (document writes, so significantly larger than the typical time series drops) on commodity hardware.
  • Ensure that if any write was confirmed, it actually hit durable storage and can never go away.
  • Properly handle partial writes or corrupted files, in such a way that none of the invariants on the system is violated.

I’m leaving aside major issues with LSM and SSTables, of which write amplification, and the inability to handle sustained high loads (because there is never a break in which you can do book keeping). Just the portions on the WAL usage (which shows broken and inefficient use) to justify another broken implementation is quite enough for me.

time to read 2 min | 246 words

imageOne of the primary reasons why businesses chose to use workflow engines is that they get pretty pictures that explain what is going on and look like they are easy to deal with. The truth is anything but that, but pretty sell.

My recommended solution for workflow has a lot going for it, if you are a developer. But if you’ll try to show a business analyst this code, they are likely to just throw their hands up in the air and give up.  Where are the pretty pictures?

One of the main advantages of this kind of approach is that it is very rigid. You are handling things in the event handlers, registering the next step in the workflow, etc. All of which is very regimented. This is so for a reason. First, it make it very easy to look at the code and understand what is going on. Second, it allow us to process the code in additional ways.

Consider the following AST visitor, which operate over the same code.

This took me about twenty minutes to write, mostly to figure out the Graphviz notation. It take advantage of the fact that the structure of the code is predictable to generate the actual flow of actions from the code.

You get to use readable code and maintainable practices and show pretty pictures to the business people.

time to read 3 min | 407 words

In my previous post, I talked about the driving forces toward a scripting solution to workflow behavior, and I presented the following code as an example of such a solution. In this post, I want to focus on the non obvious aspects of such a design.

The first thing to note about this code is that it is very structured. You are working on an event based system, and as such, the input / output for the system are highly visible. It also means that we have straightforward ways to deal with complexity. We can break some part of the behavior into a different file or even a different workflow that we’ll call into.

The second thing to note is that workflows tend to be long running processes. In the code above, we have a pretty obvious way to handle state. We get passed a state object, which we can freely modify. Changes to the state object are persisted between event invocations. That is actually a pretty important issue. Because if we store that state inside RavenDB, we also get the ability to do a bunch of other really interesting stuff:

  • You can query ongoing workflow and check their state.
  • You can use the revisions feature inside of RavenDB and be able to track down the state changes between invocations.

The input to the events is also an object, and that means that you can also store that natively, which means that you have full tracing capabilities.

The third important thing to note is that the script is just code, and even in complex cases, it is going to be pretty small. That means that you can run version resistant workflows. What do I mean by that?

Once a workflow process has started, you want to keep it on the same workflow script that is started with. This make versioning decision much nicer, and it is very easy for you to deal with changes over time.  On the other hand, sometimes you need to fix the script itself (there was a bug that allowed negative APR), in which case you can change it for just the ongoing workflows.

Actual storage of the script can be in Git, or as a separate document inside the database. Alternatively, you may actually want to include the script itself in every workflow. That is usually reserved for industries where you have to be able to reproduce exactly what happened and I wouldn’t recommend doing this in general.

time to read 4 min | 760 words

I talked about some of the requirements for proper workflow design in my previous post. As a reminder, the top ones are:

  • Cater for developers, not the business analysts. (More on this later).
  • Source control isn’t optional, meaning:
    • Multiple branches
    • Can diff & review changes
    • Merging
    • Multiple people can work at the same time
  • Encapsulate complexity

This may seem like a pretty poor list, because if you are a developer, you might be taking all of these as granted. Because of that, I wanted to display a small taste from what used to be Microsoft’s primary workflow engine.

image

A small hint… this kind of system is not going to be useful for anything relating to source control, change management, collaborative work, understanding what is going on, etc.

A better solution for this would be to use a tool that can work with source control, that developers are familiar with and can handle the required complexity.

That tool is called… code.

It checks all the boxes required, naturally. But it does have a distinct disadvantage. One of the primary reasons you want to use a workflow engine of some kind is to decouple the implementation of your business from the policies of the business. Coming back to the mortgage example, how you calculate late fees payment is fixed (in the contract itself, but usually also by law and many regulations), but figuring out whatever late fees should be waived, on the other hand, is subject to the whims of the business.

That is a pretty simple example, but in most businesses, these kind of workflows adds up. You can easily end up with dozens to hundreds of different workflows without the business being too big or complex.

There is another issue, though. Code is pretty good when you need to handle straightforward tasks. A set of if statements (which is pretty much all most workflows are) are trivial to handle. But workflow has another property, they tend to be long. Not long on computer scale (seconds), but long on people scale (months and years).

The typical process of getting a loan may involve an initial submission, review by a doctor, asking for follow up documentation (rinse – repeat a few times), getting doctor appraisal and only then being able to generate a quote for the customer. Then we have a period of time in which the customer can accept, a qualifying period, etc. That can last for a good long while.

Trying to code long running processes like that require us a very unnatural approach to coding. Especially since you are likely to need to handle software updates while the workflows are running.

In short, we are in a strange position: we want to use code, because it is clear, support software development practices that are essentials and can scale up in complexity as needed. On the other hand, we don’t want to use our usual codebase for that, because we’ll have very different deployment strategies, the manner of working is very different and there is a lot more involvement of the business in what is going on there.

The way to handle that is to create a proper boundary between parts of the system. We’ll have the workflow behavior, defined in scripts, that describe the policy of the system. These tend to be fairly high level concepts and are designed explicitly for the rule of business policy behaviors. The infrastructure for that, on the other hand, is just a standard application using normal software practices, that is driven by the workflow scripts.

And by a script, I meant literally a script. As in, JavaScript.

I want to give you a sneak peak into how I envision this kind of system, but I’ll defer full discussion of what is involved to my next post.



The idea is that we use the script to define our policy, and then we use that to make decisions and invoke the next stage in the process. You might notice that we have the state variable, which is persisted between invocations. That allow us to use a programming model that is fairly common and obvious to developers. We can usually also show this, as is, to a business analyst and get them to understand what is going on easily enough. All the actual actions are abstracted. For example, life insurance setup is a completely different workflow that we invoke.

In my next post, I’m going to drill down a bit into the details of this approach and what kind of features do we need there.

time to read 5 min | 815 words

One of the most common themes I run into when talking to customers, users and sundry people in tech is the repeated desire to fire developers.

Actually, that is probably too loaded a statement. It actually come in two parts:

  • Developers usually want to focus on the interesting bits, and the business logic portions aren’t that much fun.
  • The business analysts usually want to get things done and having to get developers to do that is considered inefficient.

If only there was a tool, or a pattern, or a framework, or something that would allow the business analysts to directly specify the behavior of the system… Why, we could cut the developers from the process entirely! And speaking as a developer, that would be a huge relief.

I think the original name for that was CASE tools, and that flopped. In fact, literally every single one of the attempts to replace developers by a tool has flopped. They got such a bad rap that people keep trying to implement them using different names. Some stuff can be done fairly easily, though. WYSIWYG for GUI is well established and Wordpress and WIX, to name the two examples that come to mind immediately, show that you can have a non techie build a proper website. In fact, you can even plug in some pretty sophisticated functionality without burdening the user with too much.

But all that takes you to a point. And past that point, the drop off is harsh. Let’s take another common tool that is used to reduce the dependency on developers, SharePoint.

You pay close to double for actual developer time on SharePoint, mostly because it is so painful to work with it.

In a recent conference, I got into a conversation about business workflows and how to best implement them. You can look at the image on the right to get a good idea about what kind of process they were talking about.

To make things real, I want to take a “simple” example, of accepting a life insurance policy. Here is what the (extremely simplified) workflow looks like for issuing a life insurance policy:

image

This looks good, and it certainly should make sense to a business analyst. However, even after I pretty much reduced the process to its bare bones and even those has been filed away, this is still pretty complex. The process of actually getting a policy is actually a lot more complex. Some questions don’t require doctor evaluation (for example, smoking) and some require supplemental documentation (oh, you were hospitalized? Gimme all these records). The doctor may recommend different rates, rejecting entirely, some exceptions in the policy, etc. All of which need to be in the workflow. Actuarial tables needs to be consulted for each of those cases, etc, etc, etc.

But something like the diagram above isn’t going to be able to handle this level of complexity. You are going to get lost very quickly if you try to put so many boxes on the screen.

So you need encapsulation.

And you’ll probably want to have a way to develop these business workflows, which means that they aren’t static.

So you need source control.

And if you have a complex business process, you likely have different people working on it.

So you need to be able to review changes, and merge them.

Note that this is explicitly distinct from being able to store the data in source control. Being able to actually diff in a meaningful fashion two versions of such a process is anything but trivial. Usually you are left with diffing the raw XML / JSON that store the structure. Good luck with that.

If the workflow is complex, you need to be able to understand what is going on under various conditions.

So you need a debugger.

In fact, pretty soon you’ll realize that you’ll need quite a lot of the things that developers do. Except that your tool of choice doesn’t do that, or if they do, they do it poorly.

Another issue that even if you somehow managed to bypass all of those details, you are going to be facing the same drop that you see elsewhere with tools that attempt to get rid of developers. At some point, the complexity grows too large, and you’ll call your development team and hand of the system to them. At which point they will be stack with a very clucky tool that attempt to be quite clever and easy to use. It is also horribly limiting for a developer. Mostly because all of the “complexity” involved is in the business process itself, not in the actual complexity of what is going on.

There are better ways of handling that, and the easier among them is to just use code. That can be… surprisingly versatile.

time to read 1 min | 92 words

Last week we pushed an update to our public demo site, this is intended to walk you through using RavenDB, show code samples and provide detailed guidance into using RavenDB from your application.

Here is an example screen shot:

image

We spent a lot of time and effort on it, and I would appreciate you taking a peek and providing feedback on how useful that is for you to learn RavenDB and how to use it.

time to read 6 min | 1043 words

imageI had some really interesting discussions while I was in CodeMash, and a few of them touched on modeling concerns with non trivial architectures. In particular, I was asked about my opinion on the role of OR/M in systems that mostly do CQRS, event processing, etc.

This is a deep question, because on first glance, your requirements from the database are pretty much just:

INSERT INTO Events(EventId, AggregateId, Time, EventJson) VALUE (…)

There isn’t really the need to do anything more interesting than that. The other size of that is a set of processes that operate on top of these event streams and produce read models that are very simple to consume as well. There isn’t any complexity in the data architecture at all, and joy to world, etc, etc.

This is true, to an extent. But this is only because you have moved a critical component of your system, the beating heart of your business. The logic, the rules, the thing that make a system more than just a dumb repository of strings and numbers.

But first, let me make sure that we are on roughly the same page. In such a system, we have:

  • Commands – that cannot return a value (but will synchronously fail if invalid). These mutate the state of the system in some manner.
  • Events – represent something that has (already) happened. Cannot be rejected by the system, even if they represent invalid state. The state of the system can be completely rebuilt from replaying these events.
  • Queries – that cannot mutate the state

I’m mixing here two separate architectures, Command Query Responsibility Separation and Event Sourcing. They aren’t the same, but they often go together hand in hand, and it make sense to talk about them together.

And because it is always easier for me to talk in concrete, rather than abstract, terms, I want to discuss a system I worked on over a decade ago. That system was basically a clinic management system, and the part that I want to talk about today was the staff scheduling option.

Scheduling shifts is a huge deal, even before we get to the part where it directly impacts how much money you get at the end of the month. There are a lot of rules, regulations, union contracts, agreement and bunch of other staff that relate to it. So this is a pretty complex area, and when you approach it, you need to do so with the due consideration that it deserves. When we want to apply CQRS/ES to it, we can consider the following factors:

The aggregates that we have are:

  • The open scheduled for two months for now. This is mutable, being worked on by the head nurse and constantly changes.
  • The proposed scheduled for next month. This one is closed, changes only rarely and usually because of big stuff (something being fired, etc).
  • The planned schedule for the current month, frozen, cannot be changed.
  • The actual schedule for the current month. This is changed if someone doesn’t show to their shift, is sick, etc.

You can think of the first three as various stages of a PlannedScheduled, but the ActualSchedule is something different entirely. There are rules around how much divergence you can have between the planned and actual schedules, which impact compensation for the people involved, for example.

Speaking of which, we haven’t yet talked about:

  • Nurses / doctors / staff – which are being assigned to shifts.
  • Clinics – a nurse may work in several different locations at different times.

There is a lot of other stuff that I’m ignoring here, because it would complicate the picture even further, but that is enough for now. For example, regardless of the shifts that a person was assigned to and showed up, they may have worked more hours (had to come to a meeting, drove to a client) and that complicated payroll, but that doesn’t matter for the scheduling.

I want to focus on two actions in this domain. First, the act of the head nurse scheduling a staff member to a particular shift. And second, the ClockedOut event which happens when a staff member completes a shift.

The ScheduleAt command place a nurse at a given shift in the schedule, which seems fairly simple on its face. However, the act of processing the command is actually really complex. Here are some of the things that you have to do:

  • Ensure that this nurse isn’t schedule to another shift, either concurrently or too close to another shift in a different address.
  • Ensure that the nurse doesn’t work with X (because issues).
  • Ensure that the role the nurse has matches the required parameters for the schedule.
  • Ensure that the number of double shifts in a time period is limited.

The last one, in particular, is a sinkhole of time. Because at the same time, another business rule says that we must give each nurse N number of shifts in a time period, and yet another dictates how to deal with competing preferences, etc.

So at this point, we have: ScheduleAtCommand.Execute() and we need to apply logic, complex, changing, business critical logic.

And at this point, for that particular part of the system, I want to have a full domain, abstracted persistence and be able to just put my head down and focus on solving the business problem.

The same applies for the ClockedOut event. Part of processing it means that we have to look at the nurse’s employment contract, count the amount of overtime worked, compute total number of hours worked in a pay period, etc. Apply rules from the clinic to the time worked, apply clauses from the employment contract to the work, etc. Again, this gets very complex very fast. For example, if you have a shift from 10PM – 6 AM, how do you compute overtime? For that matter, if this is on the last day of the month, when do you compute overtime? And what pay period do you apply it to?

Here, too, I want to have a fully fleshed out model, which can operate in the problem space freely.

In other words, a CQRS/ES architecture is going to have the domain model (and some sort of OR/M) in the middle, doing the most interesting things and tackling the heart o complexity.

time to read 3 min | 505 words

Computation during indexes open up some nice  features when we are talking about data modeling and working with your data. In this post, I want to discuss predicting the future with it. Let’s see how we can do that, shall we?

Consider the following document, representing a (simplified) customer model:

image

We have a customer that is making monthly payments. This is a pretty straightforward model, right?

We can do a lot with this kind of data. We can obviously compute the lifetime value of a customer, based on how much they paid us. We already did something very similar in a previous post, so that isn’t very interesting.

What is interesting is looking into the future. Let’s see how we can start simple, but figuring out what is the next charge rate for this customer. For now, the logic is about as simple as it can be. Monthly customers pay by month, basically. Here is the index:

image

I’m using Linq instead of JS here because I’m dealing with dates and JS support for dates is… poor.

As you can see, we are simply looking at the last date and the subscription, figuring out how much we paid the last three times and use that as the expected next payment amount. That can allow us to do nice things, obviously. We can now do queries on the future. So finding out how many customers will (probably) pay us more than 100$ on the 1st of Feb both easy and cheap.

We can actually take this further, though. Instead of using a simple index, we can use a map/reduce one. Here is what this looks like:

image

And the reduce:

image

This may seem a bit dense at first, so let’s de-cypher it, shall we?

We take the last payment date and compute the average of the last three payments, just as we did before. The fun part now is that we don’t compute just the single next payment, but the next three. We then output all the payments, both existing (that already happened) and projected (that will happen in the future) from the map function. The reduce function is a lot simpler, and simply sum up the amounts per month.

This allows us to effectively project data into the future, and this map reduce index can be used to calculate expected income. Note that this is aggregated across all customers, so we can get a pretty good picture of what is going to happen.

A real system would probably have some uncertainty factor, but that touches on business strategy more than modeling, so I don’t think we need to go into that here.

time to read 4 min | 613 words

imageIn my last post on the topic, I showed how we can define a simple computation during the indexing process. That was easy enough, for sure, but it turns out that there are quite a few use cases for this feature that go quite far from what you would expect. For example, we can use this feature as part of defining and working with business rules in our domain.

For example, let’s say that we have some logic that determine whatever a product is offered with a warranty (and for how long that warranty is valid). This is an important piece of information, obviously, but it is the kind of thing that changes on a fairly regular basis. For example, consider the following feature description:

As a user, I want to be able to see the offered warranty on the products, as well as to filter searches based on the warranty status.

Warranty rules are:

  • For new products made in house, full warranty for 24 months.
  • For new products from 3rd parties, parts only warranty for 6 months.
  • Refurbished products by us, full warranty, for half of new warranty duration.
  • Refurbished 3rd parties products, parts only warranty, 3 months.
  • Used products, parts only, 1 month.

Just from reading the description, you can see that this is a business rule, which means that it is subject to many changes over time. We can obviously create a couple of fields on the document to hold the warranty information, but that means that whenever the warranty rules change, we’ll have to go through all of them again. We’ll also need to ensure that any business logic that touches the document will re-run the logic to apply the warranty computation (to be fair, these sort of things are usually done as a subscription in RavenDB, which alleviate that need).

Without further ado, here is the index to implement the logic above:

You can now query over the warranty types and it’s duration, project them from the index, etc. Whenever a document is updates, we’ll re-compute the warranty status and update the index.

This saves you from having additional fields in your model and greatly diminish the cost of queries that need to filter on warranty or its duration (since you don’t need to do this computation during the query, only once, during indexing).

If the business rule definition changes, you can update the index definition and RavenDB will effectively roll out your change to the entire dataset. That is nice, but even though I’m writing about cool RavenDB features, there are some words of cautions that I want to mention.

Putting queryable business rules in the database can greatly ease your life, but be wary of putting too much business logic in there. In general, you want your business logic to reside right next to the rest of your application code, not running in a different server in a mode that is much harder to debug, version and diagnose. And if the level of complexity involved in the business rule exceed some level (hard to define, but easy to know when you hit it), you should probably move from defining the business rules in an index to a subscription.

A RavenDB subscription allow you to get all changes to documents and apply your own logic in response. This is a reliable way to process data in RavenDB, this runs in your own code, under your own terms, so it can enjoy all the usual benefits of… well, being your code, and not mine. You can read more about them in this post and of course, the documentation.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 4.2 Features (4):
    19 Mar 2019 - Time travel and revisions revert
  2. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  3. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
  4. Production postmortem (25):
    18 Feb 2019 - This data corruption bug requires 3 simultaneous race conditions
  5. Making money from Open Source Software (3):
    08 Feb 2019 - How we do it?
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats