Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 440 words

We recently published an article on Getting started with GraphQL and RavenDB, it will walk you through setting up Hot Chocolate to create a RavenDB-based GraphQL endpoint in your system.

Here is what this looks like:


Another new feature is the New Database Wizard, which was completely redesigned and made much simpler. We have a great number of features & options, and the idea is that we want to give you better insight into what you can do with your system.

Here is a quick peek, but take a look at the link. I’m biased, of course, but I think the team did a really great job in exposing a really complex set of options in a very clear manner.


The About Page of RavenDB Studio has undergone major updates. In particular, we have made it easier to see that new versions are available, what changes are included in each version, and check for updates. Additionally, you can now review your license details and compare features available in different editions.


Furthermore, we pushed an update to the About Page of RavenDBitself. Here we try to tell the story of RavenDB and how it came about. Beyond the narrative, our goal is to explain the design philosophy of RavenDB.

The idea is quite simple, we aim to be the database that you don’t have to think about. The one component in your system that you don’t need to worry about, the cog that just keeps on working. If we do our job right, we are a very boring database. Amusingly enough, the actual story behind it is quite interesting, and I would love to get your feedback on it.


We also published our new roadmap, which has a bunch of new goodies for you. I’m quite excited about this, and I hope you’ll too (in a boring manner 🙂). Upcoming features include data governance features, Open Telemetry integration, extremely large clusters, and more.

One of our stated goals in the roadmap is better performance, with a focus on ARM hardware, which is taking the server / cloud world by storm. To that end, we are performing many “crimes against code” to squeeze every last erg of performance from the system.

Initial results are promising, but we still have some way to go before we can publicly disclose numbers.

As usual, I would appreciate your feedback about the roadmap and the new features in general.

time to read 8 min | 1431 words

Today I got in my car to drive to work and realized that Waze suggested “Work” as the primary destination to select. I had noticed that before, and it is a really nice feature. Today, I got to thinking about how I would implement something like that.

That was a nice drive since I kept thinking about algorithms and data flow. When I got to the office, I decided to write about how we can implement something like that. Based on historical information, let’s suggest the likely destinations.

Here is the information we have:

The Lat & Lng coordinates represent the start location, the time is the start time for the trip, and the destination is obvious. In the data set above, we have trips to and from work, to the gym once a week, and to our parents over the weekends.

Based on this data, I would like to build recommendations for destinations. I could try to analyze the data and figure out all sorts of details. The prediction that I want to make is, given a location & time, to find where my likely destination is going to be.

I could try to analyze the data on a deep level, drawings on patterns, etc. Or I can take a very different approach and just throw some computing power at the problem.

Let’s talk in code since this is easier. I have a list of trips that look like this:


public record Trip(double Lat, double Lng, string Destination, DateTime Time);
Trip[] trips = RecentTrips(TimeSpan.FromDays(90));

Given that, I want to be able to write this function:


string[] SuggestDestination((double Lat, double Lng) location, DateTime now)

I’m going to start by processing the trips data, to extract the relevant information:


var historyByDest = new Dictionary<string, List<double[]>>();
foreach (var trip in trips)
{
    if (historyByDest.TryGetValue(trip.Destination, out var list) is false)
    {
        historyByDest[trip.Destination] = list = new();
    }
    list.Add([
        trip.Lat,
        trip.Lng,
        trip.Time.Hour * 100 + trip.Time.Minute, // minutes after midnight
        trip.Time.DayOfYear,
        (int)trip.Time.DayOfWeek
    ]);
}

What this code does is extract details (location, day of the week, time of day, etc.) from the trip information and store them in an array. For each trip, we basically break apart the trip across multiple dimensions.

The next step is to make the actual prediction we want, which will begin by extracting the same dimensions from the inputs we get, like so:


double[] compare = [
    location.Lat, 
    location.Lng, 
    now.Hour * 100 + now.Minute, 
    now.DayOfYear, 
    (int)now.DayOfWeek
];

Now we basically have an array of values from which we want to predict, and for each destination, an array that represents the same dimensions of historical trips. Here is the actual computation:


List<(string Dest, double Score)> scores = new();


foreach (var (dest, items) in historyByDest)
{
    double score = 0;
    foreach (var cur in items)
    {
        for (var i = 0; i < cur.Length; i++)
        {
            score += Math.Abs(cur[i] - compare[i]);
        }
    }
    score /= items.Count;
    scores.Add((dest, score));
}


scores.Sort((x, y) => x.Score.CompareTo(y.Score));

What we do here is compute the difference between the two arrays: the current start location & time compared to the start location & time of historical trips. We do that not only on the raw data but also extract additional features from the information.

For example, one dimension is the day of the week, and the other is the time of day. It is not sufficient to compare just the date itself.

The end result is the distance between the current trip start and previous trips for each of the destinations I have. Then I can return the destinations that most closely match my current location & time.

Running this over a few tests shows that this is remarkably effective. For example, if I’m at home on a Saturday, I’m very likely to visit either set of grandparents. On Sunday morning, I head to the Gym or Work, but on Monday morning, it is more likely to be Work.

All of those were mostly fixed, with the day of the week and the time being different. But If I’m at my parents’ house on a weekday (which is unusual), the location would have a far greater weight on the decision, etc. Note that the code is really trivial (I spent more time generating the actual data), but we can extract some nice information from this.

The entire code is here, admittedly it’s pretty dirty code since I wanted to test how this would actually work. At this point, I’m going to update my Curriculum Vitae and call myself a senior AI developer.

Joking aside, this approach provides a good (although highly simplified) overview of how modern AI systems work. Given a data item (image, text, etc.), you run that through the engine that outputs the embedding (those arrays we saw earlier, with values for each dimension) and then try to find its nearest neighbors across multiple dimensions.

In the example above, I explicitly defined the dimensions to use, whereas LLMs would have their“secret sauce” for this. The concept, at a sufficiently high level, is the same.

time to read 1 min | 129 words

Watch Oren Eini, CEO of RavenDB, as he delves into the intricate process of constructing a database engine using C# and .NET. Uncover the unique features that make C# a robust system language for high-end system development. Learn how C# provides direct memory access and fine-grained control, enabling developers to seamlessly blend high-level concepts with intimate control over system operations within a single project. Embark on the journey of leveraging the power of C# and .NET to craft a potent and efficient database engine, unlocking new possibilities in system development.

I’m going deep into some of the cool stuff that you can do with C# and low level programming.

time to read 6 min | 1090 words

I have a piece of code that has been living, rent-free, in my head for the past 30 years or so.

In middle school (I was 12 - 13 at the time), I was taught Pascal as the entry-level programming language. I found it to be a really fascinating topic, as you can imagine.

One of the things I did was try to read other people’s code and see what sense I could make out of it. That was way before the Internet was a thing, though. Even at that time, Pascal was mostly on its way out, and there wasn’t much code that a kid could access.

To give some context, at the time, if you wanted to move data from one place to the next, the only real option was to physically disassemble your computer, take out the hard disk, wrap it in a towel for protection, and carry it to your destination. I recall doing that and then spending some hours at a friend’s house, trying to figure out the right jumper configuration so the BIOS would recognize two drives at once.

Because of this, I had a great interest in disk parking, a technique that helps ensure that taking the hard drive from one location to the next wouldn’t ruin it. I remember running into a particular piece of code to handle disk parking in Pascal and being quite excited by this. Here I had something that I needed to do, and also in a language that I was familiar with.

I remember staring at the code and trying to make sense of it for a while. I couldn’t figure it out. In fact, I couldn’t even begin to understand what was going on there. I remember feeling both frustrated and stupid because I could understand the syntax but not what it was doing.

In fact, it was so far above my head that I was upset about it for days, to the point that decades later, it still pops into my head. When it came back for a visit this week, I decided to try to figure it out once and for all.

That led to the first problem, which is that I cannot actually recall the code. I remember it was in Pascal, that it dealt with parking the disk needles, and it was full of weird numbers that I couldn’t grasp.  Turning to Google didn’t help much, I found some code samples, but nothing that really jived with what I recalled. I ended up cobbling something that more or less matches what I had in my head.


program DiskPark;


function ParkHead(drive: char): Boolean;
var
  regs: Registers;
begin
  with regs do
  begin
    AH := $13;  
    AL := Ord(drive) - Ord('A') + $80;
    DL := AL;  
    Intr($13, regs); 
    ParkHead := (Flags and FCarry) = 0;
  end;
end;


begin
  if ParkHead('A') then
    WriteLn('Success: parked')
  else
    WriteLn('Failure, no idea why!');
end.

The actual code that I remember was beefier, and I think that it handled a bunch of additional scenarios, but it had been 30 years, I can’t recall it.

Amusingly enough, I actually had to update my publishing software, since I never thought I would be publishing Pascal code snippets 🙂.

What is interesting is that, as I look at this code and try to view it through the glasses of my 12-year-old self, I can understand exactly the frustration.

Look at this code, it is filled with random numbers and letters. What is AH or the Intr() call? I mean, looking at this, there is no place to get started understanding this.

Let’s look at this line:


AH := $13;

There is no definition of AH anywhere, I can’t tell where it is coming from. And $13, what is that about? A bad luck omen because I can’t figure it out, I think…

The problem was that my 12-year-old self wrote programs like “Guess the number”, and really advanced stuff was bubble sort. I approached the code above with the same mindset. There is a logic to this that I can figure out if I can just stare at this long enough.

The problem is that the code above is basically arbitrary, there is nothing intrinsic about it.

The weird variables that come from nowhere (AL, DL, and AH) are registers (at the time, I don’t believe I was even aware that those existed). The weird numbers are just arguments that we pass to the BIOS when you invoke a command.

In the same sense, consider this code:


return syscall(437, ptr, 525376);

This line will tell you absolutely nothing about the reasoning behind the code. While the numbers seem arbitrary,in this case, they correspond to calling SYS_openat with the flags O_APPEND | O_CREAT | O_CLOEXEC.

The whole program above is just the way you have to go to invoke a system call (as we would call it today), and there is no logic or sense to it. Just something that you have to know. And once you do know, all of this can be packaged into a much simpler, higher-level concept: the system call. (Yes, it is an interrupt invocation, not the same thing. You may go stand in the pedantic corner, mind.)

Hopefully, I wouldn’t ever have to think about this program any longer. I managed to not only understand what it does, but actually grok it.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}