Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,592
|
Comments: 51,223
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 103 words

After build an R client for RavenDB, I decided to see what it would take to build a basic RavenDB client for PHP in the same manner. It turned out to be fairly simple, and you can find the relevant code here.

Here are some basic CRUD operations:

As you can see, the API is quite straightforward to use. It isn’t the full blown API that we usually provide, but it is more than enough to get you going.

Incidentally, we also published our REST API documentation recently, so you can see how you expand this code to do even more for you.

time to read 3 min | 535 words

imageI was reminded recently that the RavenDB documentation aren’t putting enough emphasis on the fact that RavenDB can run as an in memory database.  In fact, topologies that other databases seem to think are fancy are trivial in RavenDB. You can run your cluster in a mixed mode, with a couple of nodes that are persistent and write to disk, but having other nodes that are using pure in memory storage. Combined with RavenDB’s routing capabilities, you’ll end up with a cluster where the nodes you’ll usually interact with are going to be purely in memory, but with the backend pushing data to other nodes that are persisting to disk. On the face of it, you have both the speed of in memory database with the persistence that your data craves.

So why aren’t we making a whole lot of a big deal out of this? This is a nice feature, and surely it can be used as a competitive advantage, no?

The problem is that it isn’t going to play out as you would expect it to be.

Consider the case where you dataset* is larger than memory. In that case, you are going to have to go to disk anyway. At this point, it doesn’t really matter whatever you are swapping to the page file or reading from a data file. On the other hand, if the dataset you work with can fit entirely in memory, you are going to see significant speedups. Except that with RavenDB, you won’t.

That sound bad, so let me try to express this better. If you dataset can fit into memory, RavenDB is already going to be serving it completely from memory. You don’t need to do anything to avoid disk I/O, by default, RavenDB is already going to do that for you. And if you dataset is larger than memory, RavenDB is going to ensure that we make only the minimum amount of I/O in order to serve your requests.

In other words, because of the way RavenDB is architected, you aren’t going to see any major advantages by going the pure in memory route. We are already providing most of them out of the box, while still maintain ACID guarantees as well as on disk persistence.

There are some advantages of running in memory only mode. Transactions are somewhat faster, but we have spent a lot of time optimizing our transactional hot path, you can get to hundreds of thousands of individual writes on a single node while maintaining full persistence and ACID compliance. In the vast majority of the cases, you simply don’t need the additional boost. It costs too much on to give up persistence.

So the answer is that you can run RavenDB purely in memory, and you can also do that in mixed mode cluster, but for the most part, it just doesn’t give you enough bang for the buck. You are going to be as fast for reads and almost as fast for writes (almost certainly faster than what you actually need) anyway.

* Well, working set, at least, but I’m being fast and loose with the terms here.

time to read 2 min | 319 words

R is a popular environment for working with data, mostly for statistical analysis and exploration. It is widely used by data scientists, statistician and people who get a pile of data and need to figure out how to get something out of it.

RavenDB stores data and it can be nice to go through it using R. And now you can quite easily, as you can see here:

image

Inside your R environment, load (or save locally) using:

And you are read to R(ock) Smile.

In order to set things up, you’ll need to tell R where to find your server, you can do this using:

Note that you can access both secured and unsecured servers, but you need to be aware of how where your R script is running. If this is running on Windows, you’ll need to install the PFX and provide the thumbprint. On Linux, you’ll need to provide the paths to the cert.key and cert.crt files, instead. This is because on Windows, R is compiled against schannel and… you probably don’t care, right?

Now that you have everything setup, you can start having fun with R. To issue a query, just call: rvn$query(), as shown above.

Note that you can write any query you’ll like here. For example, let’s say that I wanted to analyze the popularity of products, I can do it using:

And the result would be:

image

Doesn’t seem like something pop up from the data, but I’m not a data scientist.

You can also manipulate data using:

And here is the result in RavenDB:

image

And now, go forth and figure out what this all means.

time to read 2 min | 218 words

RavenDB 5.0 will come out with support for time series. I talked about this briefly in the past, and now we are the point where we are almost ready for the feature to stand on its own. Before we get to that point, I have a few questions before the design is set. Here is what a typical query is going to look like:

We intend to make the queries as obvious as possible, so I’m not going to explain it. If you can’t figure out what the query above is meant to do, I would like to know, though.

What sort of queries would you look to do with the data? For example, here is something that we expect users to want to do, compare and contrast different time periods, which you’ll be able to do with the following manner:

The output of this query will give you the daily summaries for the last two months, as well as a time based diff between the two (meaning that it will match on the same dates, ignoring missing values, etc).

What other methods for the “timeseries.*” would you need?

The other factor that we want to get feedback on is what sort of visualization do you want to see on top of this data in the RavenDB Studio?

FUTURE POSTS

  1. Semantic image search in RavenDB - about one day from now

There are posts all the way to Jul 28, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}