Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,174 | Comments: 50,179

Privacy Policy Terms
filter by tags archive
time to read 2 min | 325 words

Yesterday I asked about dealing with livelihood detection of nodes running in AWS. The key aspect is that this need to be simple to build and easy to explain.

Here are a couple of ways that I came up with, nothing ground breaking, but they do the work while letting someone else do all the heavy lifting.

Have a well known S3 bucket that each of the nodes will write an entry to. The idea is that we’ll have something like (filename –  value):

  • i-04e8d25534f59e930 – 2021-06-11T22:01:02
  • i-05714ffce6c1f64ad – 2021-06-11T22:00:49

The idea is that each node will scan the bucket and read through each of the files, getting the last seen time for all the nodes. We’ll consider all the nodes whose timestamp is within the last 1 minute to be alive and any other node is dead.  Of course, we’ll also need to update the node’s file on S3 every 30 seconds to ensure that other nodes know that we are alive.

The advantage here is that this is trivial to explain and implement and it can work quite well in practice.

The other option is to actually piggy back on top of the infrastructure that is dedicated for this sort of scenario. Create an elastic load balancer and setup a target group. On startup, the node will register itself to the target group and setup the health check endpoint. From this point on, each node can ask the target group to find all the healthy nodes.

This is pretty simple as well, although it requires significantly more setup. The advantage here is that we can detect more failure modes (a node that is up, but firewalled away, for example).

Other options, such as having the nodes ping each other, are actually quite complex since they need to find each other. That lead to some level of service locator, but then you’ll have to avoid each node pining all the other nodes, since that can get busy on the network.

time to read 1 min | 93 words

In this talk, Oren Eini, founder of RavenDB, is going to take apart a database engine on stage. We are going to inspect all the different pieces that make for an industrial-grade database engine, from the way the data is laid out on disk to how the database is ensuring that transactions are durable. We'll explore algorithms such as B+Tree, write-ahead logs, discuss concurrency strategies and how different features of the database work together to achieve the end goals.

time to read 1 min | 128 words

You can hear me speaking at the Angular Show about using document database from the point of view of full stack or front end developers.

In this episode, panelists Brian Love, Jennifer Wadella, and Aaron Frost welcome Oren Eini, founder of RavenDB, to the Angular Show. Oren teaches us about some of the key decisions around structured vs unstructured databases (or SQL vs NoSQL in hipster developer parlance). With the boom of document-driven unstructured databases, we wanted to learn why you might choose this technology, the pitfalls and benefits, and what are the options out there. Of course, Oren has a bit of a bias for RavenDB, so we'll learn what RavenDB is all about and why it might be a good solution for your Angular application.

time to read 1 min | 200 words

imageI ask candidates to answer the following question. Sometimes at home, sometimes during an interview with a whiteboard.

You need to create an executable that would manage a phone book. The commands you need to support are:

  • phone-book.exe /path/to/file add [name] [phone]
  • phone-book.exe /path/to/file list [skip], [limit]

The output of the list operation must be the phone book records in lexical order. You may not sort the data during the list operation, however. All such work must be done in the add operation.

You may keep any state you’ll like in the file system, but there are separate invocations of the program for each step.  This program need to support adding 10 million records.

Feel free to constrain the problem in any other way that would make it easier for you to implement it. We’ll rate the solution on how much it cost in terms of I/O.

A reminder, we are a database company, this sort of question is incredibly relevant to the things that we do daily.

I give this question to candidates with no experience, fresh graduates,  etc. How would you rate its difficulty?

time to read 2 min | 295 words

I’m talking a lot about candidates and the hiring process we go through right now. I thought it would only be fair to share a story about an interview task that I failed.

That was close to 20 years ago, and I was looking for my first job. Absolutely no professional experience and painfully aware of that. I did have a few years of working on Open Source projects, so I was confident that I had a good way to show my abilities.

The question was simple, write the code to turn the contents of this table into a hierarchical XML file:

image 

In other words, they wanted:

To answer the question, I was given pen and paper, by the way. That made my implementation choices quite hard, since I had to write it all in long hand. I tried to reproduce this from memory, and it looks like this:

This is notepad code, and I wrote it using modern API. At the time, I was using ADO.Net and the XmlDocument. The idea is the same, however, and it will spare you going through a mass of really uninteresting details.

I got so many challenges to this answer,though. I relied on null being sorted first on SQL Server and then on the fact that a parent must exist before its children. Aside from these assumptions, which I feel are fairly safe to make, I couldn’t figure out what the big deal was.

Eventually it turned out that the interviewers were trying to guide me toward a recursive solution. It never even occurred to me, since I was doing that with a single query and a recursive solution would need many such queries.

time to read 3 min | 563 words

Following a phone screen, we typically ask candidates to complete some coding tasks. The idea is that we want to see their code and asking a candidate to program during an interview… does not go well. I had a candidate some years ago that was provided with a machine, IDE and internet connection and walked out after failing for 30 minutes to reverse a string. Given that his CV said that he has 8 years of experience, I consider myself very lucky.

Back to the candidate that prompt this post. He sent us answers to the coding tasks. In Node.JS and C++. Okay, weird flex, but I can manage. I don’t actually care what language a candidate knows, especially for the junior positions.

Given that we are hiring for junior positions, we’ll usually get solutions that bend the question restrictions. For example, they would do a linear scan of a file even when they were asked not to. For the most part, we can ignore those details and focus on what the candidate is showing us. Sometimes we ask them to fix a particular issue, but usually we’ll just get them to the interview and ask them about their code there.

I like asking candidates about their code, because I presume that they spent some time thinking about it and can discuss the topic in some detail. At one memorable interview I had a candidate tell me: “I didn’t write this code, I have no idea what is going on here”. I had triple checked that this is indeed the code they sent and followed up by sending the candidate home, sans offer. We can usually talk with the candidate about what drove them to certain decisions, what impact a particular constraint would be on their code, etc.

In this case, however, the code was bad enough that I called it. I sent the candidate a notification about the issues we found in their code, detailing the 20+ critical failures that we found in the space of a few minutes of looking at it.

The breaking point for me was that the tasks did not actually work. In fact, they couldn’t work. I’m not sure if they compiled, I didn’t check, but they certain were never even eyeballed.

For example, we asked the candidate to build a server that would translate messages to Morse code and cause the server speaker to beep in Morse code. Nothing particularly fancy, I think. But we got a particular implementation for that. For example, here is the relevant code that plays the Morse code:

image

The Node.js version that I’m using doesn’t come with the relevant machine learning model to make that actually happen, I’m afraid.

The real killer for me was this part:

You might want to read this code a few times.

They pass a variable to a function, set it to a new value and expect to see that new value outside. Basically, they wanted to use an out parameter here, which isn’t valid in JavaScript.

That is the kind of fairly fundamental issue in understanding the flow of code in a program. And that is something that would never have worked.

I’m okay with getting sub optimal solutions, I’m not fine with it never have been actually looked at.

time to read 1 min | 171 words

I recently got an email from a customer. It was a very strange interaction. The email basically said:

I wanted to let you know that I recently had to setup a new server for an existing application of mine. I had to find an old version of RavenDB and I was able to get it from the site.

This is the first time in quite some time (years) that I had to touch this. I thought you would want to know that.

I do want to know that. We spend an inordinate amount of time trying to make sure that Things Work. The problem with that approach is that if we do things properly, you won’t even know that there is a challenge here that we overcome.

Our usual interaction with users is when they run into some kind of a problem. Hearing about the quite mode, where RavenDB just worked and no one paid attention to it in a few years is a breath of fresh air for me and the team in general.

FUTURE POSTS

  1. Working with business events and RavenDB - 13 hours from now

There are posts all the way to Jul 26, 2021

RECENT SERIES

  1. Postmortem (2):
    23 Jul 2021 - Accidentally quadratic indexing output
  2. re (28):
    23 Jun 2021 - The performance regression odyssey
  3. Challenge (58):
    16 Jun 2021 - Detecting livelihood in a distributed cluster
  4. Webinar (4):
    11 Jun 2021 - Machine Learning and Time Series in RavenDB with Live Examples
  5. Webinar recording (13):
    24 May 2021 - The Rewards of Escaping the Relational Mindset
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats