Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,919 | Comments: 49,399

filter by tags archive
time to read 1 min | 145 words

I have been talking about memory and RavenDB a lot, and I thought that I would share the following image from one of our test runs:

image

This is RavenDB running in a container with 16MB of available memory.  This is when we are under (moderate) load:

image

Note that the actual working set used by RavenDB is 2.28MB, and while the total allocations are higher than that, it is still quite reasonable in size.

In 1995, I got a new computer with 133MHz and 16 MB of RAM. It run a full OS and apps (Win95, Netscape, Office, etc) and was quite impressive.

It is really interesting that we can run RavenDB on that constrained environment.

time to read 3 min | 536 words

After my podcast about RavenDB’s dev ops story, I was asked an interesting question by Remi:

…do you think it can work with non technical product (let's say banking app) where your user and your engineer are not in the same industry.

This is quite an interesting scenario. A line of business application is going to be composed of two separate planes. You have the technical plane, which is fairly standard and you can get quite a lot of mileage from standard dev ops monitoring tools. For example, you probably don’t need the same level of diagnostics in a web apps or a service backend as you need for a database engine. However, the business plane is just an interesting an area and often can benefit quite a bit by building business level diagnostics into the application.

If we’ll take the example of banking app, you might want to track things such as payment flow across various accounts. You may want to be able to get a view of a single user’s activities over time or simply have a good visibility to various financial instruments.

I have run into several cases were I had to break down how loans work (interest, compounding, collateral, etc) for college educate people who were really quite smart, but didn’t pay attention to that part of life. Given that I consider loans to be one of the simplest financial instruments, building visibility into these can be of great help.

Still in the banking field, just the notion of taxation is freakishly complex. I have had a case where a customer in India was suppose to pay us a 1,000 USD. They sent 857 USD (a bit of that was eaten by bank fees) and the rest we had to claim as a refund from my tax authorities, because the rest of the money was paid as taxes in India and the two countries are doing reconciliation. Given the inherent complexity that is involved, just being able to visual, inspect and explain things is of enormous value.

Things like Know Your Customer and Anti Money Laundering are also quite complex and can put the system into a tail spin. I had a customer send us a payment, but the payment was stopped because the same customer also paid (in a completely different transaction and to a different destination entirely) with funds that came from crypto currencies. Leaving aside the aggravation of such scenarios, I am actually impressed/scared that they are able to track such things so well.

I can’t really be upset with the bank, even. Laws and regulations are in place that have strict limits on how they can behave, including personal criminal liability and Should Have Known clauses. I can understand why they are cautious.

But at the same time, trying to untangle such a system is a lot like trying to debug a software system. And having the tools in place for the business expert to easily obtain and display the data is an absolute competitive advantage.

I have recently close a bank account specifically because the level of service provided didn’t meat my expectations. Having better systems in place means that you can give better service, and that is worth quite a lot.

time to read 2 min | 311 words

The following is a fix we did to resolve a production crash of RavenDB. Take a look at the code, and consider what kind of error we are trying to handle here.

Hint: The logger class can never throw.

image

The underlying issue was simple. We run out of memory, which is an expected occurrence, and is handled by the very function that we are looking at above.

However, under low memory conditions, allocations can fail. In the code above, the allocation of the log string statement failed, which threw an error. This caused an exception to escape a thread boundary and kill the entire process.

Moving the log statement to the inside of the try statement allows us to recover from it, attempt to report the error, and release any currently held memory and attempt to reduce our memory utilization.

This particular error is annoying. A string allocation will always allocate, but even if you run out of actual memory, such allocations will often succeed because it can be served out of the already existing GC heap,without the need to trigger actual allocation from the OS. This is just a reminder that anything that can go wrong will, and with just the right set of circumstances to cause us pain.

I’ll use this opportunity to recommend, once again, reading How Complex Systems Fail. Even in this small example, you can see that it takes multiple separate things to align just right for an error to actually happen. You have to have low memory and the GC heap should be empty and only then you’ll get the actual issue. Low memory without the GC heap being full, and the code works as intended. GC heap being full and no low memory, no problemo. Smile

time to read 1 min | 142 words

imageYou can now read the Inside RavenDB directly in your browser. 

I’m really happy about this, not just because you can browse the full book online (or download to PDF) completely free. The main point is that now I can link directly to the specific part in the book where I’m discussing (in depth) certain features of RavenDB.

I think that this is going to make answering questions about RavenDB’s internal and behavior a lot easier and more approachable.

It also means, of course, that you can use Google to find information from the book.

I’m also currently working on updating the book for RavenDB 5.0. Although I’ll admit that in some cases I’m writing about features that haven’t yet seen the light of day.

time to read 1 min | 112 words

Consider the following C code snippet:

image

This code cannot be written in C#. Why? Because you can’t use ‘+’ on bool, and you can’t cast bools. So I wrote this code, instead:

And then I changed it to be this code:

Can you tell why I did that? And what is the original code trying to do?

For that matter (and I’m honestly asking here), how would you write this code in C# to get the best performance?

Hint:

image

time to read 4 min | 731 words

Product recommendations is a Big Thing. The underlying assumption is that there are patterns in the sales of products, so we can detect and recommend what products usually go together. That gives us a very nice way to give accurate recommendations to users about products that they might want to purchase.

Here is a great example of how this may look like, from Amazon:

image
As an aside, I’m really happy to see the grouping of my book with Release It~ and Writing High Performance .Net Core books. 

An interesting question is can we get this kind of behavior in RavenDB? If we were using SQL, we could probably write some queries to handle this. I wrote about this a decade ago with NHiberante, and the queries are… complex. They also have non trivial amount of runtime costs. With RavenDB, however, we can do things differently. We can use RavenDB’s map/reduce feature to handle this.

The key observation is that we want to gather, for each product, the products that were also purchased with it. We’ll use the sample dataset to test things out. There, we have an Orders collection and each order has a list of Lines that were purchased in the order. Given that information, we can use the following index definition:

Let’s break this index apart to its constituent parts. In the map, we project an entry for each line, which has the Product that is being purchased as well as all the other products that were purchased in the same order. We use this to create a link between the various products that are sold together. In the reduce, we group by the product that was sold, and aggregate the sales of related products to get the final tally.

The end result will looks like so:

image

You can see some interesting design decisions in how I built this index. We keep track of the number of orders for each product, as well as the number of times it was purchased along side each related product. This means that we can very easily implement related products, but also filter outliers. If someone purchased the “Inside RavenDB” book to learn RavenDB, but at the same time also bought the Hungry Caterpillar for their child, you probably don’t want to put recommend each other. The audiences are quite different (even though telling my own 4 years old daughter about RavenDB usually puts her to sleep pretty quickly Smile).

We can use the number of joint sales as a good indication of whatever the products are truly related, all the while using the users tell us what matter. And the best part, you don’t have to go out of your way to get this information. This is based on pretty much just the data that you are already collecting.

Because this is a map/reduce index in RavenDB, the computation happens at indexing time, not at runtime. This means that the cost of querying this information is minimal, and RavenDB will make sure that it is always up to do.

In fact, we can go to the Map/Reduce Visualizer page in RavenDB to see how this works. Let’s take a peek, shall we?

image

Here we can see a visual representation of two orders for the same product, as well as a few others. This is exactly the kind of thing we want to explore. Let’s look a bit deeper, just for products/51-A:

image

You can see how for the first order (bottom left), we have just one additional product, (products/14-A) while the second has a couple of them. We aggregate that information (Page #593) for all the 490 orders that fit there. There is also the top level (Page #1275) which aggregate the data from all the leaves.

When we query, we will get the data from the top, so even if we have a lot of orders, we don’t actually need to run any costly computation. The data is already pre-chewed for us and immediately (and cheaply) available.

time to read 1 min | 86 words

I’ll be speaking at the Progressive.NET conference later this week. I’ll be speaking about the nastiest bugs that weren’t my fault. This is a very cathartic talk to give, because I get to go in depth into all the ways I tripped and fell.

This is based on a decade of running RavenDB in production and running into the strangest situations that you can think of.

On the menu:

  • Linux and memory management
  • Windows and the printer
  • The mysterious crash on the ARM robot
  • The GC that smacked me

And much more…

FUTURE POSTS

  1. Optimizing access patterns for extendible hashing - 5 hours from now
  2. Building extendible hash leaf page - about one day from now

There are posts all the way to Nov 19, 2019

RECENT SERIES

  1. re (24):
    12 Nov 2019 - Document-Level Optimistic Concurrency in MongoDB
  2. Voron’s Roaring Set (2):
    11 Nov 2019 - Part II–Implementation
  3. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  4. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  5. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats