Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,145 | Comments: 50,118

Privacy Policy Terms
filter by tags archive
time to read 1 min | 128 words

You can hear me speaking at the Angular Show about using document database from the point of view of full stack or front end developers.

In this episode, panelists Brian Love, Jennifer Wadella, and Aaron Frost welcome Oren Eini, founder of RavenDB, to the Angular Show. Oren teaches us about some of the key decisions around structured vs unstructured databases (or SQL vs NoSQL in hipster developer parlance). With the boom of document-driven unstructured databases, we wanted to learn why you might choose this technology, the pitfalls and benefits, and what are the options out there. Of course, Oren has a bit of a bias for RavenDB, so we'll learn what RavenDB is all about and why it might be a good solution for your Angular application.

time to read 1 min | 200 words

imageI ask candidates to answer the following question. Sometimes at home, sometimes during an interview with a whiteboard.

You need to create an executable that would manage a phone book. The commands you need to support are:

  • phone-book.exe /path/to/file add [name] [phone]
  • phone-book.exe /path/to/file list [skip], [limit]

The output of the list operation must be the phone book records in lexical order. You may not sort the data during the list operation, however. All such work must be done in the add operation.

You may keep any state you’ll like in the file system, but there are separate invocations of the program for each step.  This program need to support adding 10 million records.

Feel free to constrain the problem in any other way that would make it easier for you to implement it. We’ll rate the solution on how much it cost in terms of I/O.

A reminder, we are a database company, this sort of question is incredibly relevant to the things that we do daily.

I give this question to candidates with no experience, fresh graduates,  etc. How would you rate its difficulty?

time to read 2 min | 295 words

I’m talking a lot about candidates and the hiring process we go through right now. I thought it would only be fair to share a story about an interview task that I failed.

That was close to 20 years ago, and I was looking for my first job. Absolutely no professional experience and painfully aware of that. I did have a few years of working on Open Source projects, so I was confident that I had a good way to show my abilities.

The question was simple, write the code to turn the contents of this table into a hierarchical XML file:

image 

In other words, they wanted:

To answer the question, I was given pen and paper, by the way. That made my implementation choices quite hard, since I had to write it all in long hand. I tried to reproduce this from memory, and it looks like this:

This is notepad code, and I wrote it using modern API. At the time, I was using ADO.Net and the XmlDocument. The idea is the same, however, and it will spare you going through a mass of really uninteresting details.

I got so many challenges to this answer,though. I relied on null being sorted first on SQL Server and then on the fact that a parent must exist before its children. Aside from these assumptions, which I feel are fairly safe to make, I couldn’t figure out what the big deal was.

Eventually it turned out that the interviewers were trying to guide me toward a recursive solution. It never even occurred to me, since I was doing that with a single query and a recursive solution would need many such queries.

time to read 3 min | 563 words

Following a phone screen, we typically ask candidates to complete some coding tasks. The idea is that we want to see their code and asking a candidate to program during an interview… does not go well. I had a candidate some years ago that was provided with a machine, IDE and internet connection and walked out after failing for 30 minutes to reverse a string. Given that his CV said that he has 8 years of experience, I consider myself very lucky.

Back to the candidate that prompt this post. He sent us answers to the coding tasks. In Node.JS and C++. Okay, weird flex, but I can manage. I don’t actually care what language a candidate knows, especially for the junior positions.

Given that we are hiring for junior positions, we’ll usually get solutions that bend the question restrictions. For example, they would do a linear scan of a file even when they were asked not to. For the most part, we can ignore those details and focus on what the candidate is showing us. Sometimes we ask them to fix a particular issue, but usually we’ll just get them to the interview and ask them about their code there.

I like asking candidates about their code, because I presume that they spent some time thinking about it and can discuss the topic in some detail. At one memorable interview I had a candidate tell me: “I didn’t write this code, I have no idea what is going on here”. I had triple checked that this is indeed the code they sent and followed up by sending the candidate home, sans offer. We can usually talk with the candidate about what drove them to certain decisions, what impact a particular constraint would be on their code, etc.

In this case, however, the code was bad enough that I called it. I sent the candidate a notification about the issues we found in their code, detailing the 20+ critical failures that we found in the space of a few minutes of looking at it.

The breaking point for me was that the tasks did not actually work. In fact, they couldn’t work. I’m not sure if they compiled, I didn’t check, but they certain were never even eyeballed.

For example, we asked the candidate to build a server that would translate messages to Morse code and cause the server speaker to beep in Morse code. Nothing particularly fancy, I think. But we got a particular implementation for that. For example, here is the relevant code that plays the Morse code:

image

The Node.js version that I’m using doesn’t come with the relevant machine learning model to make that actually happen, I’m afraid.

The real killer for me was this part:

You might want to read this code a few times.

They pass a variable to a function, set it to a new value and expect to see that new value outside. Basically, they wanted to use an out parameter here, which isn’t valid in JavaScript.

That is the kind of fairly fundamental issue in understanding the flow of code in a program. And that is something that would never have worked.

I’m okay with getting sub optimal solutions, I’m not fine with it never have been actually looked at.

time to read 1 min | 171 words

I recently got an email from a customer. It was a very strange interaction. The email basically said:

I wanted to let you know that I recently had to setup a new server for an existing application of mine. I had to find an old version of RavenDB and I was able to get it from the site.

This is the first time in quite some time (years) that I had to touch this. I thought you would want to know that.

I do want to know that. We spend an inordinate amount of time trying to make sure that Things Work. The problem with that approach is that if we do things properly, you won’t even know that there is a challenge here that we overcome.

Our usual interaction with users is when they run into some kind of a problem. Hearing about the quite mode, where RavenDB just worked and no one paid attention to it in a few years is a breath of fresh air for me and the team in general.

time to read 3 min | 418 words

We are hiring again (this time for Junior C# Dev positions in Israel). That means that I go through CVs (quite a few, actually).  I like going over the resumes directly, to get a feel for not just a particular candidate but what is, for lack of a better term, the state of the market.

This time, I noticed a much higher percentage of resumes with a GitHub repository link. Anytime that I see such a link, I go and look at what they have there. That is often really interesting. Then again, you run into things like this:

image

On the one hand, this is non production code, it is obviously a teaching project, which is awesome. On the other hand, I find such code painful to look at.

In the past, I would rate highly anyone that would show a GitHub account in the CV, since I could expect to see some of their projects there, usually unique ones. This time? I’m seeing a lot of basically homework assignments, and those aren’t really that interesting to review or look at. Especially since a lot of the candidates apparently had the same courses, so I saw the same 5 projects repeated over and over again.

In other words, just a GitHub account with some repositories are no longer that interesting or meaningful.

Another thing that I noticed was that a lot of those candidates had profiles with profile pictures like:

image

A small tip, if you expect people to visit your profile (and I assume you do, since you provided the link in the resume), it is worth it to put a (professional) picture of yourself there. The profiler readme on GitHub is also surprising attractive when looking at a candidate.

Another tip, if you see a position for a C# Junior Developer, it is acceptable to apply if you don’t have all the requirements, or if you exceed them. But if you are trying to find a new job as a lawyer specializing in family law, maybe don’t try to apply to a tech company.

And yes, I’m using this post as a want to vent while going over so many CVs.

Most CVs are dry, but one candidate just got bumped to the next stage based solely on the fact that in they had a “Making awesome pancakes” in the CV, which made me laugh.

time to read 2 min | 225 words

I have a consultant that did some work for me. While the majority of our people are working in Israel and Poland, we actually have people working for us all over the map.

The consultant submitted their invoice at the end of the month and we sent a wire transfer to the provided account. So far, pretty normal and business as usual. We do double verification of account details, to avoid common scams, by the way, so we know that the details we sent were correct. Except… the money never arrived.

When we inquired, it turned out that the money transfer was reversed. The reason why? This is the address that the consultant provided (not the actual address, mind, but has the same issue). Note that the

Mr. Great Consultant
1234 Cuba Avenue
Alta Vista, Ottawa, K1G 1L7
Canada

The wire transfer was flagged as potential international sanctions violation and refused.

That was… very strange.

It appears that someone saw Cuba in the address, decided that this is a problem and refuse the transfer.  I’m not sure if I would rather that this is the case of an over active Regex or a human not applying critical thinking.

We are now on week two of trying to resolve that with the bank and it is quite annoying.

Next port of call, buying Monero on the dark web… Smile.

time to read 1 min | 165 words

Among the advantages of a highly distributed system with endless edge points are that you can outsource data collection to a universe of locations, and even include them in your workflow, thereby expanding your operations. The challenges are when you have endpoints that contribute to your organization and systems, but you don’t exactly trust. They can be newcomers that you don’t know enough about, or entities with a history of misusing the data inclusion to your systems give them access to. You want the value they create, the information they amass and gather to be copied from the edge up the levels of your system, but you don’t want to give too much for that value or pay for it in the form of greater risk. Filtered replication is the art of enabling nontrusted edge points to access your system in a limited manner, replicating the information they produce in a nontrusted format.


time to read 4 min | 636 words

Yesterday I posted about Parler banning and the likely impact of that, both legally and in terms of the technical details. My expectations is that new actors will step in to fill the existing demand created by the current social network account suspensions. I had spent some time thinking about the likely effects of this, and I think that it will lead to some interesting results.

A new social network will very likely rise as a result of those actions. That network would have to be resilient for de-platforming issues. That means that it cannot assume that it can run on any of the cloud services, at least not as normally understood by today’s standards. That means that we are likely to see one of two options:

  • Fully distributed systems – independent nodes collaborating with one another to create a network. Each node may be host and operated independently. Similar to how torrents work and other fully distributed P2P systems.
  • Distributed infrastructure – a set of servers that are running on behalf of a single entity, but are spread over multiple vendors and locations. The idea is that the shutdown of a single or multiple vendors will have little impact, because of distribution of effort.

The first option is probably something like Mastodon, but I would really like to see a return to blogs & RSS as the preferred social network. That has the advantage of a true distributed model without a single controlling actor. It is also much lower cost in terms of technology and complexity. Discovery of new blogs can be handled via recommendations, search, etc.

The reason I prefer this option is that I like to blog Smile. More seriously, owning your own content and distribution platform has just become quite important. A blog is about as simple a piece of software as you can imagine. Consuming blogs is an act that require no publication of personal information, no single actor that can observe everything you do, etc.

I don’t know if this will be the direction, although it is my favorite one. It is possible that we’ll end up with Mastodon empire, with many actors creating networks of servers which may or may not be interconnected. I can see a future where you’ll have a network of dog owners vs. cat owners, but the two aren’t federated and there are isolated discussions between them.

Given that you could create links from one to the other, I don’t think we have to deal with total echo chambers. Consider a post in the cats social network: The dog owners are talking about the chore of having to go for walks at “dogs://social.media/walks-are-great”, that is so high maintenance, the silly buggers. 

That would create separate communities, with their own rules and moderation. Consider this something like subreddits, but without the single organization that can enforce global rules.

The other alternative is that a social network would rise with a truly distributed backend that is resilient to de-platforming issues. From an outside perspective, this will present as something to the existing social networks. That has the advantage of requiring the least from users, but it is a non trivial technical challenge.

I prefer the first option, but I believe it is more likely we’ll end up with the second. The reason for that is monetization strategies. If you have a many different actors cooperating to create a network, there is a question on how you pay for that. The typical revenue model for social network is advertising. That doesn’t work so well where there isn’t a single actor that can sell ads (and track users).

That said, it would be much faster and easier to get started with the first option and it may be that we’ll end up there with the force of inertia.

time to read 5 min | 927 words

I’m writing this post at a time when Donald Trump’s social media accounts were closed by Twitter, Facebook and across pretty much all popular social networks. Parler, an alternative social network has been kicked off the Apple and Google app stores and its AWS account was closed. It also appears that many vendors are dropping it and it will take significant time to get back online, if it is able to do so.

I’m not interested in talking about the reasons for this, mind. This is a hot topic political issue, in a country I’m not a citizen of, and I have no interest in getting bogged down with the political details.

I wish I could say that I had no dog in this fight, but I suspect that the current events will have a long term impact on the digital world. Note that all of those actions are taken by private companies, on their own volition. In other words, it isn’t a government or the courts demanding this behavior, but the companies’ own decision making process.

The thing that I can’t help thinking is that the current behavior by those companies is direct, blatant and very much short sighted. To start with, all of those companies are working on global scale, and they have just proven that they are powerful enough to rein in the President of the Unites States. Sure, he is at a lame duck status currently, but that is still something that upset the balance of power.

The problem with that is that while it would appear that the incoming US administration is favorable to this course of action, there are other countries and governments that are looking at this with concern. Poland is on track to pass a law prohibiting the removal of posts in social media that do not break local laws. Israel’s parliament is also considering a similar proposal.

In both cases, mind, these proposed laws got traction last year, before the current escalation in this behavior. I feel that more governments will now consider such laws in the near future, given the threat level that this represent to them. A politician is this day and age that doesn’t use social media to its fullest extent is going to be severely hampered. Both the Obama and the Trump campaigns were lauded for their innovative use of social media, for example.

There are also other considerations to ponder. One of the most costly portions of running a social network is the monitoring and filtering of posts. You have to take into account that people will post bile, illegal and obscene stuff. That’s expensive, and one of the reasons for vendors dropping of Parler was their moderation policies. That means that there is a big and expensive barrier in place for future social networks that try to grow.

I’m not sure how this is going to play out in the short term, to be honest. But in the long term, I think that there is going to be a big push, both legally and from a technical perspective to fill those holes. From a legal perspective, I would expect that many lawyers will make a lot of money on the fallout from the current events, just with regards to the banning of Parler.  I expect that there are going to be a whole lot of new precedents, both in the USA and globally.

From a technical perspective, the technology to run a distributed social network exists. Leave aside currently esoteric choices such as social network on blockchain (there appears to be a lot of them, search for that, wow!), people can fall back to good old Blog & RSS to get quite a bit of traction. It wouldn’t take much to something that looks very similar to current social networks.

Consider RSS Bandit or Google Reader vs. Twitter or Facebook. There isn’t much that you’ll need to do to go from aggregation of RSS feeds to a proper social network. One advantage of such a platform, by the way, is that it allows (and encourage) thought processes that are longer than 140 characters. I dearly miss the web of the 2000s, by the way.

Longer term, however, I would expect a rise of distributed social networks that are composed of independent but cooperating nodes (yes, I’m aware of Mastodon, and I’m familiar with Gab breaking out of that). I don’t know if this will be based on existing software or if we’ll end up with new networks, but I think that the die has been cast in this regard.

That means that the next social network will have to operate under assumed hostile environment. That means running on multiple vendors, taking no single point of failure, etc.

The biggest issue with getting a social network off the ground is… well, network effects. You need enough people in the network before you start getting more bang for the buck. But right now, there is a huge incentive for such a network, given the migration of many users from the established networks.

Parler’s app has seen hundreds of thousands of downloads a day in the past week, before it was taken down from the app stores. Gab is reporting 10,000+ new users an hour and more users in the past two days than they had seen in the past two years.

There is a hole there that will be filled, I think. Who will be the winner of all those users, I don’t know, but I think that this will have a fundamental impact on the digital world.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Building a phone book (3):
    02 Apr 2021 - Part III
  2. Building a social media platform without going bankrupt (10):
    05 Feb 2021 - Part X–Optimizing for whales
  3. Webinar recording (12):
    15 Jan 2021 - Filtered Replication in RavenDB
  4. Production postmortem (30):
    07 Jan 2021 - The file system limitation
  5. Open Source & Money (2):
    19 Nov 2020 - Part II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats