Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,831 | Comments: 49,108

filter by tags archive
time to read 4 min | 786 words

imageKrzysztof has been working on our RavenDB Go Client for almost a year, and we are at the final stretch (docs, tests, deployment, etc). He has written a blog post detailing the experience of porting over 50,000 lines of code from Java to Go.

I wanted to point out a few additional things about the porting effort and the Go client API that he didn’t get to.

From the perspective of RavenDB, we want to have as many clients as possible, because the more clients we have, the more approachable we are for developers. There are over million Go developers, so that is certainly something that we want to enable. More important, Go is a great language for server side work and primary used for just the kind of applications that can be helped from using RavenDB.

RavenDB currently have clients for:

  1. .NET  / CLR – C#, VB.Net, F#, etc.
  2. JVM – Java, Kotlin, Clojure, etc.
  3. Node.js
  4. Python
  5. Go – finalization stage
  6. C++ – alpha stage

We also have a Ruby client under wraps and wouldn’t object to having a PHP one.

We used to only run on Windows and really only pay attention to the C# client. That has changed toward the end of 2015, when we started the work on the 4.0 release of RavenDB. We knew that we were going to be cross platform and we knew that we were going to target additional languages and runtimes. That meant that we had to deal with a pretty tough choice.

Previously, when we had just a single client, we could do quite a lot in it. That meant that a lot of the  functionality and the smarts could reside in the client. But we now have 6+ clients that we need to maintain, which means that we are in a very different position.

For reference, the RavenDB Server alone is 225 KLOC, the .NET client is 62 KLOC and the other clients are about 50 KLOC each (Linq support is quite costly for .NET, in terms of LOC and overall complexity).

One of the design guidelines for RavenDB 4.0 was that we want to move, as much as possible, responsibility from the client side to the server side. We have done a lot of stuff to make this happen, but the RavenDB client is still a pretty big chunk of code. With 50 KLOC, you can do quite a lot, so what is actually going on in there?

The RavenDB client core responsibilities are:

  • Commands on the server / documents – About 12 KLOC. This provide strongly typed access to commands, including specific command error handling and handling.
  • Caching, Failover & request processing – About 3 KLOC. Handles failover and recovery, topology handling and the client side portion of RavenDB’s High Availability features by implementing transparent failover if there is a failure. Also handles request caching as well as aggressive caching.
  • JSON handling. About 3 KLOC. Type convertors, serialization helpers and other stuff related to handling JSON that we need client side.
  • Exceptions – 1.5 KLOC. Type safe exceptions for various errors takes a lot of bit of code, mostly because we try hard to get good errors to the user.

But by far, the most complex part of the RavenDB client is the session. The session is the typical API you have for working with RavenDB and it is how you’ll usually interact with it. You can see the Go client above using the session to store a document and save it to the database.

The sessions is about 20 KLOC or so. By far the biggest single component that we have.

But why it is to big? Especially since I just told you that we spent a lot of time moving responsibilities away from the client.

Because the session implements a lot of really important behaviors for the client. Without any particular order, and off the top of my head, we have:

  • Unit of Work
  • Change Tracking
  • Identity Map
  • Queries
  • Patching
  • Lazy operations

The surface area of RavenDB’s client API is very important to me. I think that giving you a high level API is quite important to reduce the complexity that you have to deal with and making it easy for you to get things done. And that end up taking quite a lot of code to implement.

The good news is that once we have a client, keeping it up to date is relatively simple. And having the taken the onus of complexity upon ourselves, we free you from having to manage that. The overall experience of building application using RavenDB is much better, to the point where you can pretty much ignore the database, because it will Just Work.

time to read 1 min | 72 words

I’m going to be in London at the beginning of June. I’ll be giving a keynote at Skills Matters as well as visiting some customers.

I have a half day and a full day slots available for consulting (RavenDB, databases and overall architecture). Drop me a line if you are interested.

I also should have an evening or two free is there is anyone who wants to sit over a beer and chat.

time to read 2 min | 287 words

In a previous post about authorization in a microservice environment, I wrote that one option is to generate an authorization token and have it hold the relevant claims for the application. I was asked how I would handle a scenario in which the security claim is over individual categories of orders and a user may have too many categories to fit the token.

This is a great question, because it showcase a really important part of such a design. An inherent limit to complexity.  The fact that having a user with a thousand individual security claims is hard isn’t a bug in the system, it is a feature.

For many such cases, it really doesn’t make sense to setup security in such a manner. How can you ever audit or reason about such a system? It just doesn’t work this way in the real world. An agent may be authorized to a dozen customers, and her manager will be allowed access to them as well. But attaching each individual customer to the manager doesn’t work. Instead, you would create a group and attach the customers to the group, then allow the manager to access the group. Such a system is much easier to work with and review. It also match a lot more closely how the real world works.

Some of the problems here are derived from the fact that it seems like, when we use a computer, we can build such a system. But in most cases, this is a false premise. Not because of actual technical limitations, but because of management overhead.

Building the system upfront so the things that should be hard are actually hard is going to be a lot better in the long run.

time to read 5 min | 808 words

imageI talked a bit about microservices architecture in the past few weeks, but I think that there is a common theme to those posts that is missed in the details.

A microservices architecture, just like Domain Driven Design or Event Source and CQRS are architectural patterns that are meant to manage complexity. In the realms of operations, Kubernetes is another good example of a tool that is meant to manage complexity.

I feel that this is a part that is all too often getting lost. The law of leaky abstractions means that you can’t really reduce complexity, you can only manage it. This means that tools and architectures that are meant to deal with complexity are themselves complex, by necessity. The problem is when you try to take a solution that was successfully applied to solve a complex problem, and  apply that to something that isn’t of equal complexity.

Keep the following formula in mind:

Solution Complexity = Architecture Complexity + ( Problem Complexity / Architecture Factor )

Let’s try to solve this formula for a couple of projects. One would be managing a little league soccer website and the other would be the standard online shop. Here are the results

Cost / Benefit of Architecture

Little League

Online Shop

Architecture Complexity

10

10

Problem Complexity

2

20

Architecture Factor

3

3

Solution Complexity

10.6

16.6

By the way, the numbers are arbitrary, I’m trying to show a point, and showing it with numbers make it easier to get the point across. The formula is real, though, based on my experience.

The idea behind the formula and the table above is simple. Every architecture you make can be ranked along two axes. One is the architectural complexity and the second is the architecture factor. The architectural complexity is a fixed (usually) number that ranks how complex it is to use the architecture. The architectural factor is how much this architecture help you deal with the overall problem complexity.

You can see above that applying the same architecture for two different problem can result in very different results. The overall solution complexity for the little league website is less than the online shop, as expected. But you can also see that there are huge fixed costs here that drive the overall complexity far higher.

Using a different architecture, which will have a much smaller architectural factor, but also much lower fixed complexity, will allow you to deliver a solution that has much lower complexity (and get it faster, with less bugs, etc).

Choosing a microservice architecture implies that you are going to have a net benefit here. The additional complexity of using microservices is offset by the fact that the architectural factor is going to reduce your overall complexity. Otherwise, it just doesn’t make sense.

An 18 wheelers is a great thing to have, if you need to ship a whole bunch of stuff. It is the Wrong Tool For The Job if you need to commute to work.

In most cases, people select the architecture that sounds right for their project, mostly because they focus on the architectural factor. Without taking into account the fixed complexity cost. When they run into that, they either re-evaluate or strive forward regardless. Let’s assume that you run into a project where they chose the microservice architecture, and then they realize that some parts of it are complex, so they cut some corners. I’m thinking about something like what is shown here. Let’s analyze what you end up with?

Architecture Complexity – 10, Architecture Factor – 1, Problem Complexity – 8 = Overall Complexity = 18

And that is for the good case where your architectural factor isn’t actually below 1, which I would argue is actually going to be in the kind of architecture that these kind of solutions reach. A Distributed Monolith has an architecture complexity of 10 and a factor of 0.75. So trying to solve a problem that has a complexity of 8 here will result in overall complexity of 20.6.

I don’t actually have real numbers to evaluate different architectures and solution complexities. That would probably require rigorous study, but empirical evidence can give good off the cuff numbers for most of the common architectures. I’m going to leave it up to the comments, if someone want to take this challenge.

Keep this in mind when you are choosing your architecture, for both green fields and brown fields projects. That can save you a lot of trouble.

time to read 3 min | 520 words

They just aren’t. And I’m talking as someone who has actually implemented multiple distributed transaction systems. People moving to microservices are now discovering a lot of the challenges and hurdles of distributed systems and it is only natural to want to go back to the cozy transactional world, where you can reason about things properly.

This post is in response to this article: Microservices and distributed transactions, which I read with interest, because it isn’t often that a post will refute it’s own premise with the very first statement.

The two-phase commit protocol was designed in the epoch of the “big iron” systems like mainframes and UNIX servers; the XA specification was defined in 1991 when the typical deployment model consisted of having all the software installed in a single server.

That is a really important observation, because in this case, we remove one big factor from the distributed transactions, the distributed part. Note that this is almost 30 years ago, distributed transactions and the two phase commit protocol aren’t running on a single node any longer. But the architecture is still rooted into the same concept. And it doesn’t work. I wrote a blog post explaining the core issues with two phase commit about 5 years ago. Nothing changed so far.

From a technical perspective, the approach that is shown in the article is interesting. It is really nice that you can have a “transaction” that spawn multiple services and databases. It is a problem that this isn’t going to result in an atomic behavior (you can observe some of the transactions being committed before others), it is a problem that this has really bad failure modes (hanging / timeout / inconsistencies) under fairly common scenarios and finally, it is a really bad approach because your microservices shouldn’t be composed using transactions.

Leaving aside all the technical details about why two phase commit is a bad idea, there is still the core architectural issue, you are tying together the services in your system. If service A is stalled for whatever reason, your service B is now impacted because it is waiting for a transaction to close.

Have fun trying to debug something like that, especially because you actual state is hidden away in some transaction manager and not readily visible. It means adding a tricky layer of complexity that will break, and will cause issues, and will create silent dependencies between your services. Silent ones, invisible ones, and they will come to haunt you.

The whole point of a microservice architecture is separation of concerns to independently managed, deployed and provisioned systems. If you need to actually have cross service transactions, you either have modelled things wrong or are doing very badly. Go back to a monolith with a single database backend and use that as the transactional store. You’ll be much happier.

Remember: Microservices. Are. Separated.

That isn’t a bug, that isn’t a hurdle to overcome. That is the point. Tying them close together is a mistake, but you’ll usually only see it after a few months of production. So take a measure of prevention before you’ll need a metric ton of cures.

time to read 4 min | 722 words

This post was triggered by this post. Mostly because I got people looking strangely at me when I shouted DO NOT DO THAT when I read the post.

We’ll use the usual Users and Orders example, because that is simple to work with. We have the usual concerns about users in our application:

  • Authentication
    • Password reset
    • Two factor auth
    • Unusual activity detection
    • Etc, etc, etc.
  • Authorization
    • Can the user perform this particular operation?
    • Can the user perform this action on this item?
    • Can the user perform this action on this item on behalf of this user?

Authentication itself is a fairly simple process. Don’t build that, go and use a builtin solution, authentication is complex, but the good side of it is that there are rarely any business specific stuff around it. You need to authenticate a user, and that is one of those things that is generally such a common concern that you can take an off the shelve solution and go with that.

Authorization is a lot more interesting. Note that we have three separate ways to ask the same question. It might be better to give concrete examples about what I mean for each one of them.

Can the user create a new order? Can they check the recent product updates, etc? Note that in this case, we aren’t operating on a particular entity, but performing global actions.

Can the user view this order? Can they change the shipping address?  Note that in this case, we have both authorization rules (you should be able to view your own orders) and business rules (you can change the shipping address on your order if the order didn’t ship and the shipping cost is the same).

Can the helpdesk guy check the status of an order for a particular customer? In this case, we have a user that is explicitly doing an action on behalf on another user. We might allow it (or not), but we almost always want to make a special note of this.

The interesting thing about this kind of system is that there are very different semantics for each of those operations. One off the primary goals for a microservice architecture is the separation of concerns, I don’t want to keep pinging the authorization service on each operation. That is important. And not just for the architectural purity of the system, one of the most common reasons for performance issues in systems is the cost of authorization checks. If you make that go over the network, that is going to kill your system.

Therefor, we need to consider how to enable proper encapsulation of concerns. An easy to do that is to have the client hold that state. In other words, as part off the authentication process, the client is going to get a token, which it can use for the next calls. That token contains the list of allowed operations / enough state to compute the authorization status for the actual operations. Naturally, that state is not something that the client can modify, and is protected with cryptography. A good example of that would be JWT. The authorization service generate a token with a key that is trusted by the other services. You can verify most authorization actions without leaving your service boundary.

This is easy for operations such as creating a new order, but how do you handle authorization on a specific entity? You aren’t going to be able to encode all the allowed entities in the token, at least not in most reasonable systems. Instead, you combine the allowed operations and the semantics of the operation itself. In other words, when loading an order, you check whatever the user has “orders/view/self” operation and that the order is for the same user id.

A more complex process is required when you have operations on behalf of. You don’t want the helpdesk people to start sniffing into what <Insert Famous Person Name Here> ordered last night, for example. Instead of complicating the entire system with “on behalf of” operations, a much better system is to go back to the authorization service. You can ask that service to generate you a special “on behalf of” token, with the user id of the required user. This create an audit trail of such actions and allow the authorization service to decide if a particular user should have the authority to act on behalf of a particular user.

time to read 4 min | 797 words

This post is in reply to this one: Is a Shared Database in Microservices Actually an Anti-pattern?

The author does a great job outlining the actual problem. Given two services that need to share some data, how do you actually manage that in a microservice architecture? The author uses the Users and Orders example, which is great, because it is pretty simple and require very little domain knowledge.

The first question to ask is: Why?

Why use microservices? Wikipedia says:

The benefit of decomposing an application into different smaller services is that it improves modularity. This makes the application easier to understand, develop, test, and become more resilient to architecture erosion.

I always like an application that is easier to understand, develop and test. Being resilient to architecture erosion is a nice bonus.

The problem is that this kind of architecture isn’t free. A system that is composed of microservices is a system that need to communicate between these services, and that is usually where most of the complexity in such a system reside.

In the Orders service, if we need to access some details about the User, how do we do that?

We can directly call to the Users service, but that creates a strong dependency between the services. If Users is down, then Orders is down. That sort of defeats the purpose of the architecture. It also means that we don’t actually have separate services, we just have exchange the call assembly instruction with RPC and distributed debugging. All the costs, none of the benefits.

The post above rightly calls this problematic, and asks whatever async integration between the services would work, using streams. I’m not quite sure what was meant there. My usual method of integrating different microservices is to not do that, instead. Either we need to send a command to a different service (which is async) or we need to publish some data from a service (also async). Both of these options are assuming to be failure resistant and unreliable. In other words, if I send a command to another service, and I need to handle failure, I setup a timer to let me know to handle not being called back.

Even if you just need some data published from another service, and can use a feature such as RavenDB ETL to share that data. You still need to take into account issues such as network failures causing you to have a laggy view of the data.

This is not an accident.

That is not your data, you have a copy (potentially stale) of published data from another service. You can use that for reference, but you cannot count on it. If you need to rely on that data, you need to send a command to the owning service, which can then make the actual decision.

In short, this is not a trivial matter. Even if the actual implementation can be done pretty easily.

The fact that each service owns a particular portion of the system is a core principle of the microservice architecture.

Having a shared database is like having a back stage pass. It’s great, in theory, but it is also open for abuse. And it will be abused. I can guarantee that with 100% confidence.

If you blur the lines between services, they are no longer independent. Have fun trying to debug why your Users’ login time spiked (Orders’ is running the monthly report). Enjoy breaking the payment processing system (you added a new type of user that the Orders system can’t process). And these are the good parts. I haven’t started to talk about what happens when the Orders service actually attempt to write to the Users’ tables.

The article suggests using DB ACL to control that, but you already having something better. A different database, because it is a different service.

It might be better to think about the situation like joint bank account. It’s reasonable to a have a joint bank account with your spouse. It is not so reasonable to have a joint bank account with Mary from accounting, because that make it easier to direct deposit your payroll. There is separation there for a reason, and yes, that does make things harder.

That’s the point, it is not an accident.

The whole point is that integration between services is going to be hard, so you’ll have less of that, and you’ll have that along very well defined boundaries. That means that we can have proper boundaries and contracts between different areas, which lead us to better modularity, thus allowing easier development, deployment and management.

If that isn’t something you want, that is fine, just don’t go into the microservice architecture. Because a monolith architecture is just fine, but a Frankenstein creation of a microservice architecture with shared database is not. Just ask Mary from accounting…

time to read 1 min | 148 words

RavenDB 4.x is using X509 Certificates for authentication. We got a feedback question from a customer about that, they much rather to use API Keys, instead.

We actually considered this as part of the design process for 4.x and we concluded that we can make this work in just the same manner as API Keys. Here is how you can make it work.

You have the certificate file (usually PFX) and convert that to a Base64 string, like so:

image

[System.Convert]::ToBase64String( (gc "cert.pfx" -Encoding byte ) )

You can take the resulting string and store it like an API key, because that is effectively how it is treated. In your application startup, you can use:

And this is it. For all intents and purposes, you can now use the certificate as an API key.

time to read 1 min | 185 words

Last week we had a couple of interesting milestones. The first of which is that we reached the End Of Life for RavenDB 3.0. If you are still running on RavenDB 3.0 (or any previous version), be aware that this marks the end of the support cycle for that version. You are strongly encouraged to upgrade to RavenDB 3.5 (which still has about 1.5 years of support).

I got an email today from a customer talking about maybe considering upgrade from the RavenDB version that was released in Dec 2012, so I’m very familiar with slow upgrade cycles.

End of Life for 3.0 means that we no longer offer support for it. If your operations team is dragging their feet on the upgrade, please hammer this point home. We really want to see people running on at least 3.5.

The other side of the news is that the new bits for RavenDB 4.2 Release Candidate are out. This release moves out of the experimental phase features such as Cluster Wide Transactions and Counters and introduce Graph Queries support. As usual, I would really love your feedback.

FUTURE POSTS

  1. Reviewing Sled: Part III - 2 days from now

There are posts all the way to Apr 22, 2019

RECENT SERIES

  1. Reviewing Sled (3):
    01 Apr 2019 - Part II
  2. RavenDB 4.2 Features (5):
    21 Mar 2019 - Diffing revisions
  3. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  4. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
  5. Production postmortem (25):
    18 Feb 2019 - This data corruption bug requires 3 simultaneous race conditions
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats