Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,481 | Comments: 47,787

filter by tags archive

Cost centers, revenue centers and politics in a cross organization world

time to read 2 min | 394 words

Sometimes we get requests from customers to evaluate and help specify the kind of hardware their RavenDB servers is going to run on. One of the more recent ones was to evaluate a couple of options and select the optimal one.

We got the specs of the two variants and had a look. Then I went and took a look at the actual costs. These are physical machines, and the cost of each of the options we were given, even maxed out, was around 2 – 3K $.

One of the machines that they wanted was using a 7,200 RPM hard disk and was somewhat cheaper than the other. To the point where we got some pushback from the customer about selecting the other option (with SSD). It took me a while to figure out what was going on.

This organization is using RavenDB (and the new machines in question) to run their core business. One of the primary factors in their business in the speed in which they can serve request (for that business, SEO is super critical metric). This business is also primarily focused on getting eyes on the site, which means that their organizational structure looks like this:

image

And the behavior of the organization follow the money. The marketing & sales department is much larger and can steer the entire organization, while the tech (which the entire organization depends on) is decidedly second string, for both decision making and budgeting.

I have run into this several times before, but it took me a long while to figure out that in many cases, the result of such arrangements is that the tech department relies on other people (in this case, us) to tell the organization at large what needs to be done. “It isn’t us (the poor relation) that needs this expensive (add 300$ or so) add on, but the database guys says that it really matters (and you’ll take the blame if you don’t approve it)”.

I don’t have anything to add, I’m afraid, I just wanted to share this observation. I’m hoping that understanding the motivation can help alleviate some of the hair pulling involved in those kind of interactions (yes, water is wet, and spending a very small amount of money is actually worth it).

RavenDB 4.0 nightly builds are now available

time to read 2 min | 245 words

imageWith the RC release out of the way, we are starting on a much faster cadence of fixes and user visible changes as we get ready to the release.

In order to allow users to be able to report issues and have then resolved as soon as possible we now publish our nightly build process.

The nightly release is literally just whatever we have at the top of the branch at the time of the release. A nightly release goes through the following release cycle:

  • It compiles
  • Release it!

In other words, a nightly should be used only on development environment where you are fine with the database deciding that names must be “Green Jane” and it is fine to burp all over your data or investigate how hot we can make your CPU.

More seriously, nightlies are a way to keep up with what we are doing, and its stability is directly related to what we are currently doing. As we come closer to the release, the nightly builds stability is going to improve, but there are no safeguards there.

It means that the typical turnaround for most issues can be as low as 24 hours (and it give me back the ability, “thanks for the bug report, fixed and will be available tonight”). All other release remains with the same level of testing and preparedness.

Leaning on the compiler

time to read 2 min | 253 words

I run into this post, claiming that typed languages can reduce bugs by 15%. I don’t have a clue if this is true, but I wanted to talk about a major feature that I really like with typed languages. They give you compiler errors.

That sounds strange, until you realize that the benefit isn’t when you are writing the code, but when you need to change it. One of the things that I frequently do when I’m modifying code, especially if this is a big change is to make a change that will intentionally cause a compiler error, and work my way from there. A good example of that from a short time ago was a particular method that return an object. That object was responsible for a LOT of allocations, and we needed to reduce that.

So I introduced pooling, and the change looked like this:

This change broke all the call sites, allowing me to go and refactor each one in turn, sometimes pushing the change one layer up in the code, until I recursively fixed all the changes and then… everything worked.

The idea isn’t to try to use the type system to prove that the system will work. That is what I have all these tests for, but the idea is that I can utilize the structure of the code to be able to pivot it. Every time that I tried to do stuff like that in a language that wasn’t strongly typed, I run into a lot of trouble.

RavenDB 4.0 Python client beta release

time to read 2 min | 201 words

I’m really happy to show off our RavenDB 4.0 Python client, now on beta release. This is the second (after the .NET one, obviously) of the new clients that are upcoming. In the pipeline we have JVM, Node.JS, Go and Ruby.

I have fallen in love with Ruby over a decade ago, almost incidentally, mainly because the Boo syntax is based on that. And I loved Boo enough to write a book about it. So I’m really happy that we can now write Python code to talk to RavenDB, with all the usual bells and whistles that accompany a full fledge client to RavenDB.

This is a small example, showing basic CRUD, but a more complex sample is when we are using Python scripts to drive functionality in our application, using batch process scripts. Here is how this looks like:

This gives us the ability to effectively run scripts that will be notified by RavenDB when something happens in the database and react to them. This is a powerful tool at the hand of the system administrator, since they can use that to add functionality to the system with ease and with the ease of use of the Python language.

With performance, test, benchmark and be ready to back out

time to read 5 min | 979 words

image

Last week I spoke about our attempt to switch our internal JS engine to Jurassic from Jint. The primary motivation was speed, Jint is an interpreter, while Jurassic is compiled to IL and eventually machine code. That is a good thing from a performance standpoint, and the benchmarks we looked at, both external nd internal, told us that we could expect anything between twice as fast and ten times as fast. That was enough to convince me to go for it. I have a lot of plans for doing more with javascript, and if it can be fast enough, that would be just gravy.

So we did that, we took all the places in our code where we were doing something with Jint and moved them to Jurrasic. Of course, that isn’t nearly as simple as it sounds. We have a lot of places like that, and a lot of already existing code. But we also took the time to do this properly, of making sure that there is a single namespace that is aware of JS execution in RavenDB and hide that functionality from the rest of the code.

Now, one of the things that we do with the scripts we execute is expose to them both functions that they can call and documents to look at and mutate. Consider the following patch script:

this.NumberOfOrders++;

This is on a customer document that may be pretty big, as in, tens of KB or higher. We don’t want to have to serialize the whole document into the JS environment and then serialize it back, that road lead to a lot of allocations and extreme performance costs. No, already with Jint we have implemented a wrapper object that we expose to the JS environment that would do lazy evaluation of just the properties that were needed by the script and track all changes so we can reserialize things more easily.

Moving to Jurassic had broken all of that, so we have to re-write it all. The good thing is that we already knew what we wanted, and how we wanted to do it, it was just a matter for us to figure out how Jurassic allows it. There was an epic coding montage (see image on the right) and we got it all back into working state.

Along the way, we paid down a lot of technical debt  around consistency and exposure of operations and made it easier all around to work with JS from inside RavenDB. hate javascript.

After a lot of time, we had everything stable enough so we could now test RavenDB with the new JS engine. The results were… abysmal. I mean, truly and horribly so.

But that wasn’t right, we specifically sought a fast JS engine, and we did everything right. We cached the generated values, we reused instances, we didn’t serialize / deserialize too much and we had done our homework. We had benchmarks that showed very clearly that Jurassic was the faster engine. Feeling very stupid, we started looking at the before and after profiling results and everything became clear and I hate javascript.

Jurassic is the faster engine, if most of your work is actually done inside the script. But most of the time, the whole point of the script is to do very little and direct the rest of the code in what it is meant to do. That is where we actually pay the most costs. And in Jurassic, this is really expensive. Also, I hate javascript.

It was really painful, but after considering this for a while, we decided to switch back to Jint. We also upgraded to the latest release of Jint and got some really nice features in the box. One of the things that I’m excited about is that it has a ES6 parser, even if it doesn’t fully support all the ES6 features. In particular, I really want to look implementing arrow functions for jint, because it would be very cool for our usecase, but this will be later, because I hate javascript.

Instead of reverting the whole of our work, we decided to take this as a practice run of the refactoring to isolate us from the JS engine. The answer is that it was much easier to switch from Jurassic to Jint then it was the other way around, but it is still not a fun process. There is too much that we depend on. But this is a really great lesson in understanding what we are paying for. We got a great deal of refactoring done, and I’m much happier about both our internal API and the exposed interfaces that we give to users. This is going to be much easier to explain now. Oh, and I hate javascript.

I had to deal with three different javascript engines (two versions of Jint and Jurassic) at a pretty deep level. For example, one of the things we expose in our scripts is the notion of null propagation. So you can write code like this:

return user.Address.City.Street.Number.StopAlready;

And even if you have missing properties along the way, the result will be null, and not an error, like normal. That requires quite deep integration with the engine in question and lead to some tough questions about the meaning of the universe and the role of keyboards in our life versus the utility of punching bugs (not a typo).

The actual code we ended up with for this feature is pretty small and quite lovely, but getting  there was a job and a half, causing me to hate javascript, and I wasn’t that fond of it in the first place.

I’ll have the performance numbers comparing all three editions sometimes next week. In the meantime, I’m going to do something that isn’t javascript.

Breaking the language barrier

time to read 5 min | 916 words

In RavenDB 4.0, we decided to finally bite the bullet and write our own query language. That led to a lot of really complex decisions that we had to make. I already posted about the language and you saw some first drafts. RQL is meant to be instantly familiar to anyone who used SQL before.

At the same time, this led to issues. In particular, the major issue is that SQL as a language is meant to handle rectangular tables, and RavenDB isn’t storing data in tables.  For that matter, we also want to give our users the ability to express themselves fully in the query language, that means support for complex queries and complex projections.

For a while, we explored the option of supporting nested selects as the way to express these semantics, but that was pretty horrible, both in terms of the complexity of the implementation and in terms of the complexity of the language. Instead, we decided that take only the good parts out of SQL Smile.

What do I mean by that? Well, here is a pretty simple query:

image

And here is how we can ask for specific fields:

image

You’ll note that we moved the select clause to the end of the query. The idea being that the syntax will hint to the user about the order of operations in the pipeline.

Next we add some filtering, giving us:

image

This isn’t really interesting except for the part where implementing this was a lot of fun. Things become both more interesting and more obvious when we look at a full query, with almost everything there:

image

And again, this is pretty boring, because except for the clauses location, you might as well write SQL. So let us see what happens when we start mixing some of RavenDB’s own operations.

Here is how we can traverse the object graph on the database side:

image

This is an interesting example, because you can see that we have traversed the path of the relation and are able to fetch not just the data from the documents we are looking for, but also the related documents.

It is important to note that this happens after the where, so you can’t filter on that (but you can plug this in the index, but that is a story for another time). I like this syntax, because it is very clear about what is going on, but at the same time, it is also very limited. If I want anything that isn’t rectangular, I need to jump through hops. Instead, let us try something else…

image

The idea is that we can use a JavaScript object literal in place of the select. Instead of fighting with the language and have to fight it, we have a very natural way to express projections from RavenDB.

The cool part is that you can apply logic as well, so things like string concatenation or logic in the select clause are absolutely permitted. However, take a look at the example, we have code duplication here, in the formatting for customer and contact names. We can do something about it, though:

image


The idea is that we also have a way to define functions to hold common logic and handle the more complex details if we need to. In this manner, instead of having to define and deploy transformers, you can define that directly in your query.

Naturally, if you want to calculate the Mandelbrot set inside the query, that is… suspect, but the idea is that having JavaScript applied on the results of the query give us a lot more freedom and easy of use.

The actual client side in C# is going to remain unchanged and was already updated to talk to the new backend, but in our dynamic language clients, I think we’ll work to expose this more, since it is a much better fit in such environments.

Finally, here is the full query, with everything that we can do:

image

Don’t focus too much the actual content of the queries, they are mostly there to show off the syntax. The last query now has the notion of include directly in the query. As an aside, you can now use load and include to handle tertiary includes. I didn’t actually set out to build them, but they came naturally from the syntax, and they are probably going to stay.

I have a lot more to say about this, but I’m so excited about this feature that I just got to get it out there. And yes, as you can see, you can declare several functions, and you can also call them from one another so there is a lot of power at the table right now.

All I asked was Hello World

time to read 2 min | 262 words

When running a full cluster on a single machine, you end up with a lot of console windows running instances of RavenDB, and it can be a bit hard to identify which is which.

We had the same issue when we have multiple tabs, it takes effort to map urls to the specific node, we solved this particular problem by making sure that this is very explicit in the browser.

image

And that turned out to be a great feature.

So I created an issue to also print the node id in the console, so we can quickly match a console window to the node id that we want. The task is literally to write something like:

Console.WriteLine(server.Cluster.NodeId);

And the only reason I created an issue for that is that I didn’t want to be side tracked by figuring out where to put this line.

Instead of a one line change, I got this:

image

Now, here is the difference between a drive by line of code and an actual resolution. Here is what this looks like:

image

And this handles nodes that haven’t been assigned and ID yet and color the nodes differently based on their topology so they can be easily be told apart. It also make the actual important information quite visible at a glance.

Error handling belongs at Layer 7 (policy)

time to read 3 min | 411 words

imageWhen you setup a RavenDB server securely, it doesn’t require a valid X509 certificate to be able to connect to it.

You might want to read that statement a few times to realize what I’m saying. You don’t need a certificate to connect to a RavenDB service. But on the other hand, we use client certificate to authenticate connections. So which is it?

The problem is with the specific OSI layer that we are talking about. When using SSL / TLS you can require a client certificate on connection. That ensures that only connections with a client certificate can be processed by the server. This is almost what we want, but it has the sad effect of having really poor error reporting capabilities.

For example, let us assume that you are trying to connect to a secured RavenDB server without a certificate. If we just require client certificate, then the error would be raised a the SSL layer, giving you errors that look something like:

  • The connection has been closed
  • Could not establish trust relationship for the SSL/TLS

This can also happen if there is a firewall in the middle, if enough packets were dropped, if there is a solar storm or just because it is Monday.

Or, it can be because you don’t have a certificate, your certificate expired, it is not known or you are trying to access a resource you are not authorized to.

In all of those cases, the error you’ll receive if you require & validate the certificate at OSI Layer 6 is going, to put it plainly, suck. Just imagine trying to debug something like that in a protocol that by design is intended to be hard / impossible to eavesdrop to. Instead, we can allow access to the server without a certificate and enforce the required certificate at a higher level.

That means that when you connect to RavenDB and you don’t have a certificate, we’ll accept the connect and then either give you an error (if you are an API) that you can reason about (because it will tell you what is wrong) or you’ll be redirected into a nice error page (if you are using the browser). Alternatively, if you have a certificate and it is not valid for any number of reasons, RavenDB will tell you all about it.

That reduce the amount of hair loss in the case of errors significantly.

RavenDB 4.0Unbounded results sets

time to read 3 min | 503 words

Unbounded result sets are a pet peeve of mine. I have seen them destroy application performance more then once. With RavenDB, I decided to cut that problem at the knees and placed a hard limit on the number of results that you can get from the server. Unless you configured it differently, you couldn’t get more than 1,024 results per query. I was very happy with this decisions, and there have been numerous cases where this has been able to save an application from serious issues.

Unfortunately, users hated it. Even though it was configurable, and even though you could effectively turn it off, just the fact that it was there was enough to make people angry.

Don’t get me wrong, I absolutely understand some of the issues raised. In particular, if the data goes over a certain size we suddenly show wrong results or error, leaving the app in a “we need to fix this NOW”. It is an easy mistake to make. In fact, in this blog, I noticed a few months back that I couldn’t get entries from 2014 to show up in the archive. The underlying reason was exactly that, I’m getting the number of items per month, and I’ve been blogging for more than 128 months, so the data got truncated.

In RavenDB 4.0 we removed the limit. If you don’t specify a limit in a query, you’ll get exactly how many results there are in the database. You can ask RavenDB to raise an error if you didn’t specify a limit clause, which is a way for you to verify that you won’t run into this issue in production, but it is off by default and will probably better match the new user expectations.

The underlying issue of loading too many results is still there, of course. And we still want to do something about it. What we did was raise alerts.

I have made a query on a large set (160,000 results, about 400 MB in all) and the following popped up in the RavenDB Studio:

image

This tells the admin that it have some information that it needs to look at. This is intentionally non obtrusive.

When you click on the notifications, you’ll get the following message.

image

And if you’ll click on the details, you’ll see the actual details of the operations that triggered this warning.

image

I actually created an issue so we’ll supply you with more information (such as the index, the query, duration and the total size that it generated over the network).

I think that this gives the admin enough information to act upon, but will not cause hardship to the application. This make it something that we Should Fix instead Get the OnCall Guy.

RavenDB 4.0Managing encrypted databases

time to read 6 min | 1012 words

imageOn the right you can see how the new database creation dialog looks like, when you want to create a new encrypted database. I talked about the actual implementation of full database encryption before, but todays post’s focus is different.

I want to talk a out managing encrypted databases. As an admin tasked working with encrypted data, I need to not only manage the data in the database itself, but I also need to handle a lot more failure points when using encryption. The most obvious of them is that if you have an encrypted database in the first place, then the data you are protecting is very likely to be sensitive in nature.

That raise the immediate question of who can see that information. For that matter, are you allowed to see that information? RavenDB 4.0 has support for time limited credentials, so you register to get credentials in the system, and using whatever workflow you have the system generate a temporary API key for you that will be valid for a short time and then expire.

What about all the other things that an admin needs to do? The most obvious example is how do you handle backups, either routine or emergency ones. It is pretty obvious that if the database is encrypted, we also want the backups to be encrypted, but are they going to use the same key? How do you restore? What about moving the database from one machine to the other?

In the end, it all hangs on the notion of keys. When you create a new encrypted database, we’ll generate a key for you, and require that you confirm for us that you have persisted that information in some manner. You can print it, download it, etc. And you can see the key right there in plain text during the db creation. However, this is the last time that the database key will ever reside in plain text.

So what about this QR code, what is it doing there? Put simply, it is there to capture attention. It replicates the same information that you have in the key itself, obviously. But what for?

The idea is that users are often hurrying through the process, (the “Yes, dear!” mode) and we want to encourage them to stop of a second and think. The use of the QR code make it also much more likely that the admin will print and save the key in an offline manner, which is likely to be safer than most methods.

So this is how we encourage administrators to safely remember the encryption key. This is useful because that give the admin the ability to take a snapshot on one machine, and then recover it on another, where the encryption key is not available, or just move the hard disk between machines if the old one failed. It is actually quite common in cloud scenarios to have a machine that has an attached cloud storage, and if the machine fails, you just spin up a new machine and attach the storage to the new one.

We keep the encryption keys secret by utilizing system specific keys (either DPAPI or decryption key that only the specific user can access), so moving machines like that will require the admin to provide the encryption key so we can continue working.

The issue of backups is different. It is very common to have to store long term backups, and needing to restore them in a separate location / situation. At that point, we need the backup to be encrypted, but we don’t want it it use the same encryption key as the database itself. This is mostly about managing keys. If I’m managing multiple databases, I don’t want to record the encryption key for each as part of the restore process. That is opening us to a missed key and a useless backup that we can do nothing about.

Instead, when you setup backups (for encrypted databases it is mandatory, for regular databases, optional) we’ll give you the option to provide a public key that we’ll then use to encrypted the backup. That means that you can more safely store it in cloud scenarios, and regardless of how many databases you have, as long as you have the private key, you’ll be able to restore the backup.

Finally, we have one last topic to cover with regards to encryption, the overall system configuration. Each database can be encrypted, sure, but the management of the database (such as connection strings that it replicates to, API keys that it uses to store backups and a lot of other potentially sensitive information) is still stored in plain text. For that matter, even if the database shouldn’t be encrypted, you might still want to encrypted the full system configuration. That lead to somewhat of a chicken and egg problem.

On the one hand, we can’t encrypt the server configuration from the get go, because then the admin will not know what the key is, and they might need that if they need to move machines, etc. But once we started, we are using the server configuration, so we can’t just encrypt that on the fly. What we ended up using is a set of command line parameters, so if the admins wants to run encrypted server configuration, they can stop the server, run a command to encrypt the server configuration and setup the appropriate encryption key retrieval process (DPAPI, for example, under the right user).

That gives us the chance to make the user aware of the key and allow to save it in a secure location. All members in a cluster with an encrypted server configuration must also have encrypted server configuration, which prevents accidental leaks.

I think that this is about it, with regards to the operations details of managing encryption, Smile. Pretty sure that I missed something, but this post is getting long as it is.

FUTURE POSTS

  1. PR Review: Beware the things you can’t see - about one day from now
  2. The right thing and what the user expect to happen are completely unrelated - 2 days from now
  3. PR Review: It’s the error handling, again - 3 days from now

There are posts all the way to Oct 25, 2017

RECENT SERIES

  1. PR Review (7):
    20 Oct 2017 - Code has cost, justify it
  2. RavenDB 4.0 (15):
    13 Oct 2017 - Interlocked distributed operations
  3. re (21):
    10 Oct 2017 - Entity Framework Core performance tuning–Part III
  4. RavenDB 4.0 Unsung Heroes (5):
    05 Oct 2017 - The design of the security error flow
  5. Writing SSL Proxy (2):
    27 Sep 2017 - Part II, delegating authentication
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats