Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,887 | Comments: 49,274

filter by tags archive
time to read 4 min | 731 words

Product recommendations is a Big Thing. The underlying assumption is that there are patterns in the sales of products, so we can detect and recommend what products usually go together. That gives us a very nice way to give accurate recommendations to users about products that they might want to purchase.

Here is a great example of how this may look like, from Amazon:

image
As an aside, I’m really happy to see the grouping of my book with Release It~ and Writing High Performance .Net Core books. 

An interesting question is can we get this kind of behavior in RavenDB? If we were using SQL, we could probably write some queries to handle this. I wrote about this a decade ago with NHiberante, and the queries are… complex. They also have non trivial amount of runtime costs. With RavenDB, however, we can do things differently. We can use RavenDB’s map/reduce feature to handle this.

The key observation is that we want to gather, for each product, the products that were also purchased with it. We’ll use the sample dataset to test things out. There, we have an Orders collection and each order has a list of Lines that were purchased in the order. Given that information, we can use the following index definition:

Let’s break this index apart to its constituent parts. In the map, we project an entry for each line, which has the Product that is being purchased as well as all the other products that were purchased in the same order. We use this to create a link between the various products that are sold together. In the reduce, we group by the product that was sold, and aggregate the sales of related products to get the final tally.

The end result will looks like so:

image

You can see some interesting design decisions in how I built this index. We keep track of the number of orders for each product, as well as the number of times it was purchased along side each related product. This means that we can very easily implement related products, but also filter outliers. If someone purchased the “Inside RavenDB” book to learn RavenDB, but at the same time also bought the Hungry Caterpillar for their child, you probably don’t want to put recommend each other. The audiences are quite different (even though telling my own 4 years old daughter about RavenDB usually puts her to sleep pretty quickly Smile).

We can use the number of joint sales as a good indication of whatever the products are truly related, all the while using the users tell us what matter. And the best part, you don’t have to go out of your way to get this information. This is based on pretty much just the data that you are already collecting.

Because this is a map/reduce index in RavenDB, the computation happens at indexing time, not at runtime. This means that the cost of querying this information is minimal, and RavenDB will make sure that it is always up to do.

In fact, we can go to the Map/Reduce Visualizer page in RavenDB to see how this works. Let’s take a peek, shall we?

image

Here we can see a visual representation of two orders for the same product, as well as a few others. This is exactly the kind of thing we want to explore. Let’s look a bit deeper, just for products/51-A:

image

You can see how for the first order (bottom left), we have just one additional product, (products/14-A) while the second has a couple of them. We aggregate that information (Page #593) for all the 490 orders that fit there. There is also the top level (Page #1275) which aggregate the data from all the leaves.

When we query, we will get the data from the top, so even if we have a lot of orders, we don’t actually need to run any costly computation. The data is already pre-chewed for us and immediately (and cheaply) available.

time to read 1 min | 86 words

I’ll be speaking at the Progressive.NET conference later this week. I’ll be speaking about the nastiest bugs that weren’t my fault. This is a very cathartic talk to give, because I get to go in depth into all the ways I tripped and fell.

This is based on a decade of running RavenDB in production and running into the strangest situations that you can think of.

On the menu:

  • Linux and memory management
  • Windows and the printer
  • The mysterious crash on the ARM robot
  • The GC that smacked me

And much more…

time to read 3 min | 474 words

RavenDB, as of 4.0, requires that the document identifier will be a string. In fact, that has always been the requirement, but in previous versions, we allowed you to pretend that this isn’t the case. That has led to… some complexities, because people had a number id in their model, but inside RavenDB that was represented as a string, always.

I just got the following question:

In my entities, can I have the Id property of any type instead string to avoid primitive obsession? I would use a generic Id<tentity> type for ids. This type can be converted into string before saving in DB by calling ToString() and transformed from string into Id<tentity> (when fetching from DB) by invocation of static method like public Id<tentity> FromString(string id).

The short answer for this is that no, there is no way to do this. A document id in your model has to be a string.

The longer answer is that you can absolutely do this, but you have to understand the divergence of your entity model vs. the document model. The key is that RavenDB doesn’t actually require that your model would have an Id property. It is usually defined, because it makes things easier, but that isn’t required. RavenDB is perfectly happy managing the document key internally. Combine that with the ability to modify how documents are converted to entities, and you have a solution. Let’s look at the code…

And here is how it looks like:

image

The idea is that we customize a few things inside of RavenDB.

  • We tell the serializer that it should ignore the UserId property
  • We tell RavenDB that after creating an entity from the server, we should setup the Id property as we want it.
  • We do the same just before we store the entity in the server, just to be sure that we got the complete package.
  • We disable the usual identity generation logic for the documents we care about and tell RavenDB that it should ignore trying to set the identity property on the document on its own.

The end result is that we have an entity with a strongly typed identifier in our model. It took a bit of work, but not overly so.

That said, I would suggest that you should either have a string identifier property in your model or not have one at all (either option takes no code in RavenDB). Having an identifier and jumping through hoops like that tend to make for awkward experience. For example, RavenDB has no idea about this property, so if you need to support queries as well, you’ll need to extend the query support. It’s possible, but shows that there is additional complexity that can be avoided.

time to read 4 min | 749 words

Consider a business that needs to manage leasing apartments to tenants. One of the more important aspects of the business is tracking how much money is due. Because of the highly regulated nature of leasing, there are several interesting requirements that pop up.

The current issue is how do you tackle the baseline for eviction. Let’s say that the region that the business is operating under has the following minimum requirements for eviction:

  • Total unpaid debt (30 days from invoice) that is greater than $2,000.
  • Total overdue debt (30 – 60 days from invoice) that is greater than $1,000.
  • Total overdue debt (greater than 60 days from invoice) that is greater than $500.

I’m using the leasing concept here because it is easy to understand that the date ranges themselves are dynamic. We don’t want to wait for the next first of the month to see the changes.

The idea is that we want to be able to show a grid like this:

image

The property manager can then take action based on this data. And here is the raw data that we are working on:

image

It’s easy to see that this customer still has a balance of $175. Note that this balance is as of the July 9th, because we apply payments to the oldest invoice we have. The question now becomes, how can we turn this raw data into the table above?

This turn out to be a bit hard for RavenDB, because it is optimize to answer your queries fast, which means that having to do recalculation on each query (based on the current dates) is not easy. I already shown how to do this kind of task easily enough when we are looking at a single customer. The problem is that we want to have an overall view of the system, not just on a single customer. And ideally without it costing too much.

The key observation to handle this efficiently is RavenDB is to understand that we don’t need to actually generate the table above directly. We just need to get the data to a point where it is trivial to do so. After some thinking, I came up with the following desired output:

image

There idea here is that we are going to give both overall view on the customer’s account as well as details about its outstanding debts. The important detail that we need to understand is that this customer status is unlikely to grow too big. We aren’t likely to see customers that have debts that spans many years, so the size of this document is naturally bounded. The cost of going from this output to the table above is negligible and the process of doing so is obvious. So the only question now is how do we do this?

We are going to utilize RavenDB’s multi-map/reduce to the fullest here. Let’s first look at the maps:

There isn’t really anything interesting here. We are just outputting the data that we need for the second, more interesting stage, the reduce:

There is a whole bunch of stuff going on here, but leave aside how much JavaScript scares me, let’s dig into this.

The important parameters we have here are:

  • Debt
  • CreditBalance
  • RemainingBalance

We compute the CreditBalance by summing all the outstanding payments for the customer. We then gather up all the debts for the customer and sort them by date ascending. The next stage is to apply the outstanding credits toward each of the debts, erasing them from the list if they have been completely paid off. Along the way, we compute the overall remaining balance as well.

And that is pretty much it. It is important to understand that this code is recursive. In other words, if we have a customer that has a lot of invoices and receipts, we aren’t going to be computing this in one go over everything. Instead, we’ll compute this incrementally, over subsets of the data, applying the reduce function as we go.

Queries on this index are going to be fast, and applying new invoices and receipts is going to require very little effort. You can now also do the usual things you do with indexes. For example, sorting the customers by their outstanding balance or total lifetime value.

time to read 1 min | 107 words

Webinar banner

Tomorrow I’m going to be giving a webinar about RavenDB Cloud, among the topics I’m going to cover are:

  • The type of work you can hand over to us while you put more time into your application
  • The different types of instances you can use and the resources you can provision
  • Setting up a distributed database instance and securing it in minutes
  • Provisioning a free instance to try it out
  • Ways to save money on the cloud
  • Getting BANG for your BUCK! How RavenDB performs fast on less expensive machines
  • Monitoring costs, performance, events

You can register to the webinar using the following link.

time to read 3 min | 441 words

The upgrade process from RavenDB 3.5 and earlier to RavenDB 4.x is not easy. This is because I made a conscious decision to not have backward compatibility between these versions. I made that decision because we had to be able to make massive changes internally in order to get to the targets that we set to ourselves. I actually discussed that decision in detail in a previous blog post and a talk.

Four years later, I still stand by that decision, but I also regret the spanner that it threw into the works. Migrating RavenDB applications to 4.x from previous versions is harder than it should. In retrospect, we probably should have invested the time in a compatibility layer that would make it easier to migrate.

I wanted to take a moment and talk about RavenDB 5.0, expected in 2020, and our plans for that release. We are going to be doing some minor cleanup of the API. Methods and classes that are marked as [Obsolete] will be removed. These tend to be at the very edge of the explored API and have been marked as such for quite some time. Beyond these change (which you’ll a clear and obvious alternative for), you aren’t going to need to do much at all.

Our goal for converting an application from RavenDB 4.x to 5.x is that the process is for 90% of the projects - Update NuGet packages, compile, you are done. For the 10%, it may mean that you need to make some minor changes. For example, change DisableEntitiesTracking to NoTracking if you are using the low level query API.

We also intend to allow at least the vast majority of operations to just work between 4.x client and 5.x server. In other words, even when you upgrade server versions, you aren’t going to have to upgrade the client version unless you want to use the new features.

There are also additional considerations that we have to take into account:

  • RavenDB now have official clients for: .NET, JVM, Go, Python, Node.JS, C++. As well as a unofficial clients.
  • RavenDB Cloud instances are maintained by us, and will be upgraded to newer versions on a regular schedule.

The cost of making a backward incompatible change at this point is too high for us to take lightly, and we are going to try very hard to avoid it. The move from 3.5 to 4.x was a one time thing that we had to do in order to continue evolving the product, not something that we plan again anytime soon.

We are also offering migration services for clients who want to move their applications from 3.x to 4.x.

RavenDB Cloud

time to read 4 min | 763 words

image

TLDR

RavenDB now offers cloud hosting for RavenDB clusters. Manage your data with this awesome solution built by the RavenDB team. Access through cloud.ravendb.net.

A free option is available.

When I wrote the first few lines of code for RavenDB over a decade ago, Amazon Web Services still had the beta label on it and deploying to production meant a server in the basement. The landscape for server-side software has changed considerably. Nowadays you have to justify not running on the cloud. Originally, RavenDB’s features were driven by the kind of systems and setup found in a typical corporate data center. Now a lot of our features are directly impacted by the operating environment in the various cloud platforms.

The same team that develops RavenDB itself now offers a database as a service solution (DBaaS) that can be found at cloud.ravendb.net. Our service offering is available on all Amazon Web Services and Microsoft Azure regions, with Google Cloud Platform soon to follow.

What do you get?

You get a fully managed service. We take care of all backend chores of your databases while you focus on building your applications to deliver even more value to your business.

We have done everything possible to make sure that the only task you’ll need to do is come up with the data to put into RavenDB Cloud. Tasks such as monitoring, updating or managing the system, and even creating a default backup task per database, are the responsibility of our team and are handled without a fuss behind the scenes.

The DBaaS package includes encryption over the wire as well as encryption at rest -- you can also deploy encrypted databases with your own encryption keys.

The idea of dynamic scaling in managed systems has been a core tenant for cloud architecture, and RavenDB Cloud fits right into this model. In just a few clicks you can provision a cluster, deploy it to anywhere in the world, and start working. If you need more capacity, a simple click will provide you with more resources -- without your code or your customers even being aware.

If you have a Black Friday event or a special discount day coming, you can scale up your system ahead of time. If during that day you are pleasantly surprised with more than anticipated activity, you can power through the spike by scaling immediately and then reduce the capacity back to normal levels once the peak is traversed.

But RavenDB Cloud is not just about reducing the overhead of running databases -- we built RavenDB Cloud to save you money. As a highly tuned system, you can manage your load on fewer resources, which also translates into more savings down the line.

Pricing Model

Our cloud offering has an on-demand subscription as well as discounts for yearly contracts. A 10% introductory discount is now available, lasting to the rest of this year.

Some hosted databases solutions charge you per request or per maximum utilized capacity. Such solutions are complex to understand when it comes to billing time. I intensely dislike complexity -- especially when it comes to bills! Price predictability is important to us. With RavenDB Cloud, you pay for your resources at a flat and known rate to make sure there won’t be any surprises at the end of the month.

The RavenDB instances can be configured a-la-carte, according to your needs. A tailored solution with geo distributed clusters, support for on-premise & cloud integration, widely distributed deployments and custom instance types is only an email away.

RavenDB Cloud has several tiers of clusters available. If you are running a small to medium sized application, you can go with our basic instances and enjoy a reduced cost. For more demanding workloads, you can use more performant instances that have full access to the cloud resources to get maximum performance.

RavenDB Cloud also has a completely free option. Go to cloud.ravendb.net and select the free option. You’ll have your own secured, managed and hassle-free instance of RavenDB in moments. Go ahead and try it out right now.

What about RavenHQ?

RavenHQ has been providing managed RavenDB instances since 2012. It will continue offering RavenDB hosting for version 3.5 and earlier. RavenDB Cloud will provide managed clusters for RavenDB 4.2 and up.

Migrating from a RavenDB 4.x instance hosted on RavenHQ to RavenDB Cloud is simple and can be done in just a few clicks.

Not only will migrating to RavenDB Cloud not cost you anything extra, for most cases your new managed database service will be cheaper than what you had before.

FUTURE POSTS

  1. Inside RavenDB now available on RavenDB.Net - 15 hours from now

There are posts all the way to Sep 16, 2019

RECENT SERIES

  1. re (22):
    19 Aug 2019 - The Order of the JSON, AKA–irresponsible assumptions and blind spots
  2. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  3. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
  4. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
  5. Reviewing Sled (3):
    23 Apr 2019 - Part III
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats