Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,467 | Comments: 47,703

filter by tags archive

RavenDB 4.0 nightly builds are now available

time to read 2 min | 245 words

imageWith the RC release out of the way, we are starting on a much faster cadence of fixes and user visible changes as we get ready to the release.

In order to allow users to be able to report issues and have then resolved as soon as possible we now publish our nightly build process.

The nightly release is literally just whatever we have at the top of the branch at the time of the release. A nightly release goes through the following release cycle:

  • It compiles
  • Release it!

In other words, a nightly should be used only on development environment where you are fine with the database deciding that names must be “Green Jane” and it is fine to burp all over your data or investigate how hot we can make your CPU.

More seriously, nightlies are a way to keep up with what we are doing, and its stability is directly related to what we are currently doing. As we come closer to the release, the nightly builds stability is going to improve, but there are no safeguards there.

It means that the typical turnaround for most issues can be as low as 24 hours (and it give me back the ability, “thanks for the bug report, fixed and will be available tonight”). All other release remains with the same level of testing and preparedness.

RavenDB 4.0 Unsung HeroesField compression

time to read 3 min | 498 words

I have been talking a lot about major features and making things visible and all sort of really cool things. What I haven’t been talking about is a lot of the work that has gone into the backend and all the stuff that isn’t sexy and bright. You probably don’t really care how the piping system in your house work, at least until the toilet doesn’t flush. A lot of the things that we did with RavenDB 4.0 is to look at all the pain points that we have run into and try to resolve them. This series of posts is meant to expose some of these hidden features. If we did our job right, you will never even know that these features exists, they are that good.

In RavenDB 3.x we had a feature called Document Compression. This allowed a user to save significant amount of space by having the documents stored in a compressed form on disk. If you had large documents, you could typically see significant space savings from enabling this feature. With RavenDB 4.0, we removed it completely. The reason is that we need to store documents in a way that allow us to load them and work with them in their raw form without any additional work. This is key for many optimizations that apply to RavenDB 4.0.

However, that doesn’t mean that we gave up on compression entirely. Instead of compressing the whole document, which would require us to decompress any time that we wanted to do something to it, we selectively compress individual fields. Typically, large documents are large because they have either a few very large fields or a collection that contain many items. The blittable format used by RavenDB handles this in two ways. First, we don’t need to repeat field names every time, we store this once per document and we can compress large field values on the fly.

Take this blog for instance, a lot of the data inside it is actually stored in large text fields (blog posts, comments, etc). That means that when stored in RavenDB 4.0, we can take advantage of the field compression and reduce the amount of space we use. At the same time, because we are only compressing selected fields, it means that we can still work with the document natively. A trivial example would be to pull the recent blog post titles. we can fetch just these values (and since they are pretty small already, they wouldn’t be compressed) directly, and not have to touch the large text field that is the actual post contents.

Here is what this looks like in RavenDB 4.0 when I’m looking at the internal storage breakdown for all documents.

image

Even though I have been writing for over a decade, I don’t have enough posts yet to make a statistically meaningful difference, the total database sizes for both are 128MB.

Upcoming conferences

time to read 1 min | 150 words

In the wake of RavenDB 4.0 Release Candidate, you are going to be seeing quite a lot of us Smile.

Here is the schedule for the rest of the year. In all of these conferences we are going to have a booth and demo RavenDB 4.0 live. We are going to demonstrate distributed database on conference network, so expect a lot of demo of the failover behavior Smile.

I’ll be speaking in Build Stuff about Modeling in Non Relation World and Extreme Performance Architecture as well as giving a full day workshop about RavenDB 4.0.

RavenDB 4.0 Python client beta release

time to read 2 min | 201 words

I’m really happy to show off our RavenDB 4.0 Python client, now on beta release. This is the second (after the .NET one, obviously) of the new clients that are upcoming. In the pipeline we have JVM, Node.JS, Go and Ruby.

I have fallen in love with Ruby over a decade ago, almost incidentally, mainly because the Boo syntax is based on that. And I loved Boo enough to write a book about it. So I’m really happy that we can now write Python code to talk to RavenDB, with all the usual bells and whistles that accompany a full fledge client to RavenDB.

This is a small example, showing basic CRUD, but a more complex sample is when we are using Python scripts to drive functionality in our application, using batch process scripts. Here is how this looks like:

This gives us the ability to effectively run scripts that will be notified by RavenDB when something happens in the database and react to them. This is a powerful tool at the hand of the system administrator, since they can use that to add functionality to the system with ease and with the ease of use of the Python language.

JS execution performance and a whole lot of effort…

time to read 2 min | 267 words

I spoke at length about our adventures with JS engine, but I didn’t talk about the actual numbers, because I wanted them to be final.

Here are the results:

Put

Set

Push

Query

3.5

63

125

-

-

Jint - Origin

7

5

12

1784

Jurassic

33

38

42

239

Jint - Optimized

2

6

8

191

I intentionally don’t provide context for those numbers, it doesn’t actually matter.

Put means that we just have a single call to Put in a patch script, Set means that we set a value in the patch, Push add an item to an array. Query test a very simple projection.

You can discard the query value for the original Jint. This was a very trivial implementation that always created a new engine (and paid full cost for it) while we were checking the feature itself.

What is interesting about this is comparing the values to 3.5, we are so much better. Another is that after we moved back to Jint, we run another set of tests, and a lot of the optimizations were directly in our code, primarily in how we send to and receive the data from the JS engine.

And this is without any of the optimizations that we could still write. Identifying common patterns and lifting them would be the obvious answer, and we keep that for later, once we have a few more common scenarios from users to explore.

When disk and hardware fall…

time to read 5 min | 956 words

animal-1299573_640When your back is against the wall, and your only hope is for black magic (and alcohol).

The title of this post is taken from this song. The topic of this post is a pretty sad one, but a mandatory discussion when dealing with data that you don’t want to lose. We are going to discuss hard system failures.

The source can be things like actual physical disk errors to faulty memory causing corruption. The end result is that you have a database that is corrupted in some manner. RavenDB actually have multiple levels of protections to detect such scenarios. All the data is verified with checksums on first load from the disk, and the transaction journal is verified when applying it as well. But stuff happens, and thanks to Murphy, that stuff isn’t always pleasant.

One of the hard criteria for the Release Candidate was a good story around catastrophic data recovery. What do I mean by that? I mean that something corrupted the data file in such a way that RavenDB cannot load normally. So sit on tight and let me tell you this story.

We first need to define what we are trying to handle. The catastrophic data recovery feature is meant to:

  1. Recover user data (documents, attachments, etc) stored inside a RavenDB file.
  2. Recover as much data as possible, disregarding its state, letting user verify correctness (i.e, may recover deleted documents).
  3. Does not include indexed data, configuration, cluster settings, etc. This is because these can be quite easily handled by recreating indexes or setting up a new cluster.
  4. Does not replace high availability, backups or proper preventive maintenance.
  5. Does not attempt to handle malicious corruption of the data.

Basically. the idea is that when you are shit creek, we can hand you paddle. That said, you are still up in shit creek.

I mentioned previously that RavenDB go to quite some length to ensure that it knows when the data on disk is messed up. We also did a lot of work into making sure that when needed, we can actually do some meaningful work to extract your data out. This means that when looking at the raw file format, we actually have extra data there that isn’t actually used for anything in RavenDB except by the recovery tools. That reason (the change to the file format) was why it was a Stop-Ship priority issue.

Given that we are already in catastrophic data recovery mode, we can make very little assumption about the state of the data. A database is a complex beast, involving a lot of moving parts and the on disk format is very complex and subject to a lot of state and behavior. We are already in catastrophic territory, so we can’t just use the data as we would normally would. Imagine a tree where following the pointers to the lower level might at some cases lead to garbage data or invalid memory. We have to assume that the data has been corrupted.

Some systems handle this by having two copies of the master data records. Given that RavenDB is assumed to run on modern file systems, we don’t bother this. ReFS on Windows and ZFS on Linux handle that task better and we assume that production usage will use something similar. Instead, we designed the way we store the data on disk so we can read through the raw bytes and still make sense of what is going on inside it.

In other words, we are going to effectively read one page (8KB) at a time, verify that the checksum matches the expected value and then look at the content. If this is a document or an attachment, we can detect that and recover them, without having to understand anything else about the way the system work. In fact, the recovery tool is intentionally limited to a basic forward scan of the data, without any understanding of the actual file format.

There are some complications when we are dealing with large documents (they can span more than 8 KB) and large attachments (we support attachments that are more then 2GB in size) can requite us to jump around a bit, but all of this can be done with very minimal understanding of the file format. The idea was that we can’t rely on any of the complex structures (B+Trees, internal indexes, etc) but can still recover anything that is still recoverable.

This also led to an interesting feature. Because we are looking at the raw data, whenever we see a document, we are going to write it out. But that document might have actually been deleted. The recovery tool doesn’t have a way of checking (it is intentionally limited) so it just write it out. This means that we can use the recovery tool to “undelete” documents. Note that this isn’t actually guaranteed, don’t assume that you have an “undelete” feature, depending on the state of the moon and the stomach content of the nearest duck, it may work, or it may not.

The recovery tool is important, but it isn’t magic, so some words of caution are in order. If you have to use the catastrophic data recovery tool, you are in trouble. High availability features such as replication and offsite replica are the things you should be using, and backups are so important I can’t stress it enough.

The recommended deployment for RavenDB 4.0 is going to be in a High Availability cluster with scheduled backups. The recovery tool is important for us, but you should assume from the get go that if you need to use it, you aren’t in a good place.

RavenDB 4.0 Release Candidate

time to read 3 min | 508 words

imageTwo years ago almost to the day (I just checked) we started the initial design and implementation process for RavenDB 4.0. Today, it is with great pride and joy that I can tell you that we have the Release Candidate ready and available for download.

During the past two years we have improved RavenDB performance by an order of magnitude, drastically simplified operations and monitoring, introduced a new query language and distributed cluster technology, to name just a few highlights of the work that has been done. You’ll find RavenDB easier to use, much faster and packed full of features.

We are now almost done. The next few months are going to see us focus primarily on stabilization, smoothing the user experience and the behavior of the cluster and the database and in general making things nicer. We started with making things work, then making them work fast and now we are the stage where we want to make sure that it using RavenDB is a true pleasure. We are almost there, but we still need to work on things like deployment story, write documentation, provide a migration pathway, verify that the UI make sense even if you aren’t aware of what is going on under the hood, etc. In general, a lot of detailed work that is going to take time but everything major has been done.

I’m quite happy about it, if you can’t tell from the text so far.

The current road map calls for spending another 6 – 10 weeks in pushing RavenDB as far as we can to see how it behaves and filling in all the remaining touch ups and then pushing for release. We’ll probably also have an RC2 before, with Go Live capabilities.

In the meantime, if you are working on a new project with RavenDB I highly recommend switching to RavenDB 4.0. Note that the migration currently between 3.5 and 4.0 involved exporting your database on 3.5 and importing it on 4.0 and that you might have to make some minor changes to your code when you upgrade your applications to RavenDB 4.0. We’ll have a better story around upgrading from RavenDB 3.5 in the next RC, but we also want to hear from you about the kind of scenarios that needs to be supported there. In every metric that I can think of, RavenDB 4.0 is far superior, so I suggest moving as soon as possible to it.

You’ll be able to upgrade seamlessly from the RC release to RC2 or RTM without any trouble.

RavenDB 4.0 is available for Windows (64 & 32 bits), Ubuntu (14 & 16 – 64 bits), Raspberry PI and on Docker (Windows Nano Server and Ubuntu images).

We also released a .NET client and will follow up soon with Python, JVM, Node.JS, Ruby and Go clients.

Your feedback on all aspects of RavenDB 4.0 is desired and welcome, let it pour in.

RavenDB 4.0Support options

time to read 2 min | 324 words

imageRavenDB 4.0 is going to have a completely free community edition that you could use to run production systems. We do this with the expectation that users will go with the community edition and either will be happy there or upgrade at some point to the commercial editions.

As part of the restructuring we are doing, we intend to also significantly simplify the support model. Our current support model is per RavenDB instance with professional support costing 2,000$ per instance and production (24/7) support costing 6,000$. We got a lot of feedback on this being complex to work with. In particular, the per instance cost meant that operations would need to talk to us during redeployments in order to maintain coverage of all their RavenDB instances.

As part of the Great Simplification we do in 4.0 we also want to tackle the issue of support. As a result, with the rollout of the RavenDB 4.0 RC we are going to move to flat support costs.

  • Professional Support will cost 15% of the license cost and give you access to our support engineers with a guaranteed next business day response time.
  • Production Support will cost 30% of the license cost and give you access to the core team members with 24/7 availability.

This is a significant reduction in price, because we are trying to encourage more people to get support and our previous approach was unbalanced.

The community support will continue to be offered, obviously, but we have no SLA around issues raised there.

The commercial support options will only be available for the Professional and Enterprise editions.

Here is how the costs change between RavenDB 3.x and RavenDB 4.5 for production support:

RavenDB 3.x RavenDB 4.0 Savings
Standard +
Production
Support

6,698$

5,843$

15% reduction

Enterprise 4 Cores +

Production Support

9,152$6,864$33% reduction

RavenDB 4.0 Release Candidate Updates

time to read 3 min | 509 words

We are on the verge of releasing RavenDB 4.0 release candidate, currently the release is set to mid next week but we’re close enough that I can smell it. Once the release pressure is off, I can start discussing more of the things that we bring to the table.

For the record, we are now standing on less then 30 remaining issues before we can ship RC, most of them relating to licensing. And speaking of this, we made a couple of decisions lately that you should probably know about.

First, we are switching to the RavenDB 4.0 pricing starting next week, for the period of the RC, we’ll even go with 30% discount. In other words, you can do a bit of arbitrage and get a license now at the old pricing, we’ll grandfather in all existing orders when we make the switch.

Second, regarding the free community edition. After a lot of deliberations, I decided that community edition as we wanted to offer it made no sense. RavenDB 4.0 is a distributed, robust database, and we want to encourage people to use us in real world settings. Because of that, we decided to scrap the limits on running the community edition as a cluster.  This means that you’ll be able to deploy a full blown RavenDB cluster for production using just the community edition.

Third, we decided to change the pricing model a bit. Instead of you purchasing a license per server, which caused a lot of back & forth between our sales people and customers, we decided to move to a flat per core model. In other words, if you need to deploy 3 node cluster with 4 cores each, you’ll purchase a cluster license for 12 cores. You could then deploy that cluster on up to 12 machines (with 1 core assigned for each machine in this case). This simplify things significantly and it gives a lot more flexibility to the operations team.

Here is a rough draft of what this would look like. You can see a 16 cores cluster license and that we assigned 3 cores to Node A.

image

The community edition we’ll provide will have 3 cores and a maximum of 3 nodes in the cluster. This will allow you to have a single RavenDB node, or a cluster of three nodes with one core each, given you high availability, automatic failover, etc. There are still things that aren’t in the free edition (ETL, cloud backups, monitoring, support, etc) but the idea is that you could run real things on the free edition and that you could upgrade when your needs actually require it.

I’m really excited about this because it means that features that are currently Enterprise only are now pushed all the way to the community edition. This gives you the chance to use a world class distributed database that was built with an explicit design goal of being correct, fast and easy to use.

FUTURE POSTS

  1. Writing SSL Proxy: Part II, delegating authentication - 17 hours from now
  2. 0.1x or 10x, time matters - about one day from now
  3. RavenDB 4.0 Unsung Heroes: Map/reduce - 3 days from now
  4. RavenDB 4.0 Unsung Heroes: Indexing related data - 6 days from now
  5. RavenDB 4.0 Unsung Heroes: The indexing threads - 7 days from now

And 2 more posts are pending...

There are posts all the way to Oct 05, 2017

RECENT SERIES

  1. RavenDB 4.0 Unsung Heroes (5):
    22 Sep 2017 - Field compression
  2. re (18):
    26 Apr 2017 - Writing a Time Series Database from Scratch
  3. Writing SSL Proxy (2):
    26 Sep 2017 - Part I, routing
  4. RavenDB 4.0 (13):
    11 Sep 2017 - Support options
  5. Optimizing select projections (5):
    01 Sep 2017 - Part IV–Understand, don’t do
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats