Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 107 words

We are looking to finish work on our client API for RavenDB in Go, Node.JS and Ruby. The current state is that the code is mostly there, and we need help to give it the final push (and spit & polish) and drive it to release status.

For each of these, I estimate that there is about 6 – 8 weeks of work, after which we’ll be managing the rest of this internally. You can see the current state of each client API here:

If you are interested, please send an email to jobs@ravendb.net, this is applicable to both local (Hadera, Israel) or remote work.

time to read 1 min | 156 words

You might have noticed that I’ve slowed down writing blog posts. This is because pretty much every word I write these days goes into the book.

I just completed chapter 17, and we are now standing at around 550 pages in length, and there is only one chapter left.

Chapter 17 talks about backup and restores, how they work in RavenDB and how to properly manage your backup strategies in RavenDB. It sounds deathly dull, but it can actually be quite interesting, since backing up of a distributed database (and restoring one, which is harder) and non trivial problems. I hope that I did justice to the topic.

Next, maybe even as soon as early next week is Chapter 18, operational recipes, which will cover all sort of single use case response from the operations team to how to deal with various scenarios inside RavenDB.

You can read the draft here and your feedback is always appreciated.

Times are hard

time to read 2 min | 277 words

One of the things RavenDB does is allow you to define a backup task that will be executed on a given schedule (such as every Saturday at midnight). However, as it turns out, specifying the right time is actually a pretty hard thing to do. The problem is what to do when you have multiple time zones involved:

  • UTC
  • The server local time
  • The operator’s local time
  • The business hours of the application using the database

In some cases, you might have a server in Germany being managed from Japan with users primarily from South Africa. There are at least four different options for when Saturday’s midnight is, and the one sure thing is that it will happen when you least want it to.

Because of that, RavenDB takes the simple positon that the time that it cares about is the server's own time. An operator is free to define it as they wish, but only the server local time is relevant. But we still need to make the operator’s job easier, and we do it using the following method:

image

The operator can specify the time specification using CRON syntax (which should be common to most admins). We translate the CRON syntax to a human readable string, but we also provide the next backup date with the server’s time (when it will actually run), the operator’s local time (which as you can see is a bit different from the server) and the duration. The later is actually really important because it gives the operator an intuitive understanding of when the backup is going to run next.

time to read 3 min | 470 words

imageYou learn a lot of things when talking to clients. Some of them are really fascinating, some of them are quite horrifying. But one of the most important things that I have learned to say to client is: “This is out of scope.”

This can be an incredibly frustrating thing to say, both for me and the client, but it is sometimes really necessary. There are times when you see a problem, and you know how to resolve it, but it is simply too big an issue to take upon yourself.

Let me give a concrete example. A customer was facing a coordination problem with their system, they need to deal with multiple systems and orchestrate actions among them. Let’s imagine that this is an online shop (because that is the default example) and you need to process and order and ship it to the user.

The problem at this point is that the ordering process need to coordinate the payment service, the fulfillment service, the shipping service, deal with backorders, etc. Given that this is B2B system, the customer wasn’t concerned with the speed of the system but was really focused on the correctness of the result.

Their desire, to have a single transaction encompass all such operations. They were quite willing to pay the price in performance for that, in order to achieve that goal. And they turned to us for help in this matter. They wanted the ability to persistently and transactionally store data inside RavenDB and only “commit” it at a given point.

We suggested a few options (draft documents, a flag in the document, etc), but we didn’t answer their core question. How could they actually get the transactional behavior across multiple operations that they wanted?

The reason we didn’t answer that question is that it is… out of scope. RavenDB doesn’t have this feature (for really good reasons) and that is clearly documented. There is no expectation for us to have this feature, and we don’t.  That is where we stop.

But what is the reason that we take this stance? We have a lot of experience in such systems and we can certainly help find a proper solution, why not do so?

Ignoring other reasons (such as this isn’t what we do), there is a primary problem with this approach. I think that the whole idea is badly broken, and any suggestion that I make will be used against us later. This was your idea, it broke (doesn’t matter you told us it would), now fix it. It is a bit more awkward to have to say “sorry, out of scope” ahead of time, but much better than having to deal with the dirty diapers at the end.

time to read 1 min | 164 words

imageI’m really happy to announce that we have just release a brand new version of NHibernate Profiler and Entity Framework Profiler.

What is new in for NHibernate?

  • Support for NHibernate 5.1 and 5.1.1
  • Support for .NET Core
    • supported on the following platforms: netstandard2.0, net46, netcoreapp2.0
  • Fixed various minor issues regarding showing duplicate errors and warnings from NHibernate.
  • Better support for DateTime precision issues in NHibernate 5.0

What is new for Entity Framework:

  • Support for EF Core
    • supported on the following platforms: netstandard2.0, net46, netcoreapp2.0
    • netstandard1.6 is also supported via a separate dll.
  • Support for DataTable data type in custom reporting
  • Support for ReadCount and RecordsAFfected in EF Core 2.0
  • Fixed issue for EF 6 on .NET 4.7
  • Can report using thread name, not just application name
  • Provide integration hooks for ASP.Net Core to provide contextual information for queries.

New stuff for both of them:

  • Improved column type mismatch warning
  • Support for UniqueIdentifier parameters type
  • Support for integration with VS 2017.
time to read 3 min | 520 words

imageThe trigger for this post is the following question in the RavenDB mailing list. Basically, given a system that is composed of multiple services (running as separate processes), the question is whatever have each service use its own DocumentStore or have a separate service (DbService) process that will encapsulate all access to RavenDB. The idea, as I understand it, is to avoid the DocumentStore creation because it is expensive.

The quick answer here is simple: <blink*>Don’t ever do that!</blink>

* Yes, I’m old.

That is all, you don’t need to read the rest of this post.

Oh, you are still here, as long as you are here, let me explain my reasoning for such a reaction.

DocumentStore isn’t actually expensive to create. In fact, for most purposes, it is actually quite cheap. It holds no network resources on its own (connection pooling is handled by a global pool, anyway). All it does is manage the http cache on the client, cache things like serialization information, etc.

The reason we recommend that you won’t create document stores all the time is that we saw people creating a document store for the purpose of using a single session and then disposing it. That is quite wasteful, it forces us to allocate more memory and avoid the use of caching entirely. But creating a few document stores for each service that you have? That is cheap to do.

What really triggered this post is the idea of having a separate process just to host the DocumentStore, the DbService process. This is a bad idea. Let me count the ways.

Your service process needs some data, so it will go to the DbService (over HTTP, probably) and ask for it. Your DbService will then call to RavenDB to get the data using the normal session and return the data to the original service. That service will process the data, maybe mutate it and save it back. It will have to do that by sending the data back to the DbService process, which will create a new session and save it to RavenDB.

This is adding another round trip to every database query, it means that you can’t natively express queries inside your service (since you need to send it to the DbService). It creates strong ties between all the services you have the the DbService, as well as a single point of failure. Even if you have multiple copies of DbService, you now need to write the code to do automatic failover between them. Updating a field in a class for one service means that you have to deploy the DbService to recognize the new field, for example.

In terms of client code, aside from having to write awkward queries, you also need to deal with serialization costs, and you have to write your own logic for change tracking, unit of work, etc.

In other words, this has all the disadvantages of a repository pattern with the added benefit of making many remote calls and seriously complicating deployment.

time to read 2 min | 325 words

One of the first steps you’ll have when migration RavenDB from 3.5 to 4.0 is to actually get your data in 4.0. There are a few ways of doing that.

You can create a new database in 4.0 from a 3.5 database directory. You can click on the chevron on the New database button to access it:

image

This will give you the following screen, where you can point to the existing database directory (the RavenDB 3.5 server must be offline for this) and the Raven.StorageExporter tool that comes with the 3.5 distribution. RavenDB 4.0 will then create your database and import all the data from the existing db to the new one.

image

This works great if you are doing this is a one time operation, but in many cases, the migration process is a long one. You’ll start by migrating your code, and it will take one or two iterations to complete the full process.

In order to handle that scenario, you’ll create a new database on 4.0 normally, then go to Settings > Import and select importing from another database. In this mode, the 3.5 server is online and running. You’ll provide the details of the server and database and then click on Migrate Database, as you can see in the picture.

image

This will import all the data from the existing database to the new database. This can be an ongoing process. Once this is done, you can migrate your application code to use RavenDB 4.0 and at deployment time, you’ll run this again.

Each time you run this migration, it will get only the updated data from the source server, it doesn’t have to read it all from scratch.

time to read 1 min | 76 words

With RavenDB 4.0 out and about for a few months already, we have been mostly focused on finishing up the release. That meant working on documentation (the book is already past the 500 pages mark!), additional clients, helping clients to go to production with 4.0 and gathering feedback.

In fact, this is the point of this post today. I would really like to know your thoughts about RavenDB 4.0 and what should go into the next version?

time to read 2 min | 209 words

imageThis issue in the RavenDB Security Report is pretty simple, when we generate a certificate, we need to generate a certificate serial number. We were using a random number that is 64 bits in length, but that is too small. The problem is the birthday attack. For a 64 bits number, you only need about 5 billion attempts to generate a collision. In modern cryptography, that is actually a very low security threshold.

So we fixed it and used a random value that is 20 bytes in length. Or so we thought. This single issue is worth the trouble of publicly discussing the security report. As it turned out, I didn’t read the API docs properly and used this construction:

new BigInteger(20, random);

Where the random is a cryptographically secured random number generator. The problem here is that this BigInteger constructor uses bits length, not bytes length. And that resulted in a security “fix” that actually much worse than the previous situation (you only need a bit over a thousand tries to generate a collision). This has already been fixed, obviously, but I’m very happy that it was caught.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}