Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,213 | Comments: 50,332

Privacy Policy Terms
filter by tags archive

Looking into Zig

time to read 5 min | 939 words

I think that it was the Pragmatic Programmer that recommend that you should learn a new language a year. For me, in 2020 that was Rust. I read a bunch of books about the language, I read significant amount of code and wrote some non trivial amount of code in Rust. That was sufficient to get me to grok the language, I’m not a Rust developer by any mean, but I walked with Rusty shoes for long enough to get the feeling.

This year, I decided to look into Zig. Both Zig and Rust are more or less in the same space, replacing C. They couldn’t be more different, however. Zig is a small language. I spent a couple of evenings going through the language reference and that was it, I had a pretty good idea about how to do things.

The learning curve is mostly flat, and that is pretty huge. This is especially because I can’t help but compare Zig to Rust. I spent a lot of effort understanding Rust, but I had spent almost no cycles trying to grok Zig. It was simple, obvious and quite clear.  In terms of power, mind, I would rate both languages on roughly the same spot. You can write really nice code in Zig, it is just that you don’t need to bend your head into funny shapes to get the compiler to accept your code.

One of the key features for Zig is its comptime feature. This is a feature that allow Zig to run code at compilation time. That isn’t a new or exciting feature, to be honest, C++ had it for many years. The key difference is that Zig can use this feature for code generation. For example, to create a generic list, you’ll write the following code:

Note that we are writing here a function that returns a type, which you can then use. That approach is incredibly powerful, but at the same time, this is simple, it is obvious.

Zig is easy to learn, because there isn’t a whole lot more that is hidden from you behind the scenes.

That actually leads to another really important aspect in the design of Zig. There isn’t anything behind the scenes. For example, you cannot allocate memory in Zig, there is no global function that will do that for you. You need to do so using an allocator. That means that the fact that memory allocations can fail is pervasive throughout the API, standard library and your code. There isn’t a "this may fail on rare occasions” scenario that you somehow need to handle, this is clear and in your face.

At the same time, Zig does a lot more to make things easier than C. I want to focus on a few important points:

  • Zig has the concept of Errors. In the code above, the function push() may fail because of an allocation failure. In this case, the function will fail with a return code. That is handled by the try keyword, which will abort the current function and return the error. Note that errors and regular values are separate channels in Zig (there is a union mark with ! at the function declaration).
  • Zig has support for defer and errdefer keyword. The defer keyword works just as you would expect it to, at the function exit, it will run all the deferred statement in reverse order. The errdefer is a lot more interesting, because that will only run if the function exits with an error. This seemingly simple change has a huge impact on the code quality and the complexity that a developer need to keep in their head.
  • Zig has built-in testing, to the point where test is a keyword in the language.

To give you some context, when I was writing C code, I literally wrote the exact same thing (manually, with macros and nastiness) in order to help me get things done.

In the same manner, the fact that allocation are explicit and managed from the top (all types that needs to allocate gets the allocator from their parents) means that you get to do some really cool things with memory. It is easy to say something like “this piece of code gets 10MB of memory only” and let it run like that. It also end up creating a more robust software system, I think, so memory allocations happen aren’t a rare occurrence, they happen all the time.

In general, Zig feel like a lot better C, no additional mental overhead. Compared to Rust, you can get working almost immediately and the compilation speed is excellent, to the point where you don’t really need to think about it. Rust makes you feel the slow compilation cost from the get go, basically, which is noticeable as your system grows bigger.

Thinking about this, I actually feel that we should compare Zig to Go, because it is closer in concept to what I think Go wanted to be. In fact, looking at the most common complaints people has against Go, Zig answers them all.

If you haven’t noticed, I’m quite enjoying working with Zig.

And as an aside, the fact that a language can implement a language server and get automatic IDE support is freaking amazing. You can also debug Zig code inside VS Code, for example, pretty much with no more issues than you would for native code. Zig is implemented on top of LLVM and gains a lot of the benefits from it.

One thing that kept going through my mind when I looked at all that I got out of the package is: standing on the shoulders of giants.

time to read 3 min | 511 words

I’m teaching a course in university, which gives me some interesting perspective into the mind of new people who join our profession.

One of the biggest differences that I noticed was with the approach to software architecture and maintenance concerns.  Frankly, some of the the exercises that I had to review made my eyes bleed a little (the students got full marks, because the point was getting things done, not code quality). I talked with the students about the topic and I realized that I have a very different perspective on software development and architecture than they have.

The codebase that I work with the most is RavenDB, I have been working on the project for the past 12 years, with some pieces of code going back closer to two decades. In contrast, my rule for giving tasks for students is that I can complete the task in under two hours from an empty slate.

Part and parcel of the way that I’m thinking about software is the realization that any piece of code that I’ll write is going to be maintained for a long period of time. A student writing code for a course doesn’t have that approach, in fact, it is rare that they use the same code across semesters. That lead to seeing a lot of practices as unnecessary or superfluous. Even some of the things that I consider as the very basic (source control, tests, build scripts) are things that the students didn’t even encounter up to this point (years 2 and 3 for most of the people I interact with) and they may very well get a degree with no real exposure for those concerns.

Most tasks in university are well scoped, clear and they are known to be feasible within the given time frame. Most of the tasks outside of university are anything but.

That got me thinking about how you can get a student to realize the value inherent in industry best practices, and the only real way to do that is to immerse them in a big project, something that has been around for at least 3 – 5 years. Ideally, you could have some project that the students will do throughout the degree, but that requires a massive amount of coordination upfront. It is likely not feasible outside of specific fields. If you are learning to be a programmer in the gaming industry, maybe you can do something like produce a game throughout the degree, but my guess is that this is still not possible.

A better alternative would be to give students the chance to work with a large project, maybe even contributing code to it. The problem there is that having a whole class start randomly sending pull requests to a project is likely to cause some heartburn to the maintenance staff.

What was your experience when moving from single use, transient projects to projects that are expected to run for decades? Not as a single running instance, just a project that is going to be kept alive for a long while…

time to read 6 min | 1011 words

noun_Stable Graph_1295028 (2)I want to talk about an aspect of RavenDB that is usually ignored. Its pricing structure. The model we use for pricing RavenDB is pretty simple, we charge on a per core basis, with two tiers depending on what features you want exactly. The details and the actual numbers can be found here. We recently had a potential customer contact us about doing a new project with RavenDB. They had a bunch of questions to us about technical details that pertain to their usage scenario. They also asked a lot of questions about licensing. That is pretty normal and something that we field on a daily basis, nothing to write a blog post about.  So why am I writing this?

For licensing, we pointed to the buy page and that was it. We sometimes need to clarify what we mean exactly by “core” and occasionally the customer will ask us about more specialized scenarios (such as massive deployments, running on multiple purpose hardware, etc). The customer in question was quite surprised by the interaction. The contact person for us was one of the sales people, and they got all the information in the first couple of emails from us (we needed some clarification about what exactly their scenario was).

It turned out that this customer was considering using MongoDB for their solution, but they have had a very different interaction with them. I was interested enough in this to ask them for a timeline of the events.

It took three and a half months from initial contact until the customer actually had a number that they could work with.

 
Day Event Result
1 Login to MongoDB Atlas, use the chat feature to ask for pricing for running a MongoDB cluster in an on-premise configuration Got an email address to send in MongoDB to discuss this issue
1 Email to Sales Development Representative to explain about the project and the required needs. Ask for pricing information. The representative suggests to keep using Atlas on the cloud, propose a phone call.
3 30 minutes call with Sales Development Representative. Representative is pushing hard to remain on Atlas in cloud instead of on-premise solution. After insisting that only on-premise solution is viable for project, representative asks to get current scenarios and projected growth, MongoDB will  generate a quote based on those numbers.
6 After researching various growth scenarios, sending an email with the details to the representative.  
37 Enough time has passed with no reply from MongoDB. Asking for status of our inquiry and ETA for the quote.  
65 After multiple emails, managed to setup another call with the representative (same one as before). Another 30 minutes call. The representative cannot locate previous detailed scenario, pushed to using Atlas cloud option again. The Sales Development Representative pulls a Corporate Account Executive to email conversation. A new meeting is scheduled.
72 Meeting with Corporate Account Executive, explain the scenario again (3rd time). Being pressured to use the cloud option on Atlas. Setting up a call with MongoDB's Solution Architect.
80 Before meeting with Solution Architect, another meeting by request of Corporate Account Executive. Tried to convince to use Atlas again, need to explain why this needs to be an on-premise solution.
87 Meeting with Solution Architect, presenting the proposed solution (again). Had to explain (yes, again) why this is an on-premise solution and cannot use Atlas. Solution Architect sends a script to run to verify various numbers on test project MongoDB instance.
94 Seven days since all requested details were sent, requested quote again, asked if something is still missing and whether can finally get the quote.  
101 Got the quote! Fairly hard to figure out from the quote what exactly the final number would be, asking a question.
102 Got an answer explaining how to interpret the numbers that were sent to us.  

To repeat that again. From having a scenario to having a quote (or even just a guesstimate pricing) took over 100 days! It also took five different meetings with various MongoDB representatives (and having to keep explaining the same thing many times over) to get a number.

During that timeframe, the customer is not sure yet whether the technology stack that they are considering is going to be viable for their scenario. During that time frame, they can’t really proceed (since the cost may be prohibitive for their scenario) and are stuck in a place of uncertainty and doubt about their choices.

For reference, the time frame from a customer asking about RavenDB for the first time and when they have at least a fairly good idea about what kind of expense we are talking about is measured in hours. Admittedly, sometimes it would be 96 hours (because of weekends, usually), but the idea that it can take so long to actually get a dollar value for a customer is absolutely insane.

Quite aside from any technical differences between the products, the fact that you have all the information upfront with RavenDB turns out to be a major advantage. For that matter, for most scenarios, customers don’t have to talk to us, they can do everything on their own.

The best customer service, in my opinion, is to not have to go through that. For the majority of the cases, you can do everything on your own. Some customers would need more details and reach out, in which case the only sensible thing is to actually make it easy for them.

We have had cases where we closed the loop (got the details from the customer, sent a quote and the quote was accepted) before competitors even respond to the first email. This isn’t as technically exciting as a new algorithm or a major performance improvement, but it has a serious and real impact on the acceptance of the product.

Oh, and as a final factor, it turns out that we are the more cost efficient option, by a lot.

time to read 1 min | 200 words

imageI ask candidates to answer the following question. Sometimes at home, sometimes during an interview with a whiteboard.

You need to create an executable that would manage a phone book. The commands you need to support are:

  • phone-book.exe /path/to/file add [name] [phone]
  • phone-book.exe /path/to/file list [skip], [limit]

The output of the list operation must be the phone book records in lexical order. You may not sort the data during the list operation, however. All such work must be done in the add operation.

You may keep any state you’ll like in the file system, but there are separate invocations of the program for each step.  This program need to support adding 10 million records.

Feel free to constrain the problem in any other way that would make it easier for you to implement it. We’ll rate the solution on how much it cost in terms of I/O.

A reminder, we are a database company, this sort of question is incredibly relevant to the things that we do daily.

I give this question to candidates with no experience, fresh graduates,  etc. How would you rate its difficulty?

time to read 6 min | 1093 words

I mentioned that we are currently hiring for a junior dev position and we have been absolutely swamped with candidates. Leaving aside the divorce lawyer that tried to apply to the position and the several accountants (I don’t really get it either) we typically get people with very little experience.

In fact, this position is explicitly open to people with no experience whatsoever. Given that most junior positions require a minimum of two years, I think that got us a lot of candidates.

The fact that we don’t require prior experience doesn’t meant that we don’t have prerequisites, of course. We are a database company and the fundamentals are important to us. A typical task in RavenDB involves a lot of taxes, from ACID compliance, distributed computing, strict performance requirements, visibility into the actions of the database, readability of the code, etc.

I talked before about the cost of a bad hire, and in the nearly a decade that passed since I wrote that post, I hasn’t changed my mind. I would rather end up with no one than hire someone that isn’t a match for our needs.

Our interview process is composed of a phone call, a few coding questions and then an in person interview. At this point, given that I have been doing that for over a decade, I think that I interviewed well over 5,000 people. A job interview stresses some people out a lot. Yesterday I had to listen to a candidate speak so fast that I could barely understand the words and I had to stop a candidate and tell them that they are currently in the 95% percentile of people I spoke to, so they wouldn’t freeze because of a flubbed question.

I twitted(anonymously) about the ups and down of the process and seem to have created quite a lot of noise. A typical phone call for a potential candidate takes about 15 – 30 minutes and is mostly there to serve as an explicit filter. If they don’t meet the minimum requirements that we have, there is no point in wasting either of our time.

One of the questions that I ask is: Build a phone book application that stores the data in memory and outputs the records in lexical order. This can stump some people, so we have an extra question to help. Instead of trying to output the data in lexical order, how would you ensure that you don’t have a duplicate phone number in such a system? Scanning through the entire list of records each time is obviously not the way to go. If they still can’t think of a way to do that the next hint is to think about O(1) and what data structure would fit this requirement. On the Twitter thread, quite a few people were up in arms about that.

Building a phone book is the kind of task that I remember doing in high school programming class as a teenager. Admittedly, that was in Pascal, but I just checked six different computer science degrees and for all of them, data structures was a compulsory course. Moreover, things like “what is the complexity of this operation” are things that we do multiple times a day here. We are building a database here, so operations on data is literally our bread and butter. But even for “normal” operations, that is crucial. A common example, we need to display some information to the user about their database. The information actually come from two sources internally. One is the database list which contains various metadata and one is the active database instance, which can give us the running stats such as the number of requests for this database in the past minute.

Let’s take a look at this code:

The complexity of this code is O(N^2). In other words, for ten databases, it would cost us a hundred. But for 250 databases it would cost 62,500 and for 500 it would be 250,000. Almost the same code, but without the needless cost:

This is neither theoretical nor rare, and it is part of every programming curriculum that you’ll find. Further more, I’m not even asking about the theoretical properties of various algorithms, or asking a candidate to compute the complexity of a particular piece of code. I’m presenting a fairly simple and common task (make sure that a piece of data in unique) and asking how to perform that efficiently. 

From a lot of the reactions, it seems that plenty of people believe that data structures aren’t part of the fundamentals and shouldn’t be something that is deeply embedded in the mindset of developers. To me, that is like saying that a carpenter shouldn’t be aware of the difference between a nail or a screw.

Rob has an interesting thread on the topic, which I wanted to address specifically:

It does not matter what language, actually. In JavaScript, you’ll not use a “new Hashtable()”, you’ll use an object, sure, but that is a meaningless detail. In fact, arrays implemented as hashes in JavaScript maintain most of their important properties, actually. O(1) access time by index being the key. If you want to go deeper, than in practice, the JS runtime will usually use an actual array if it detects that this is how you use the variable.

And I’m sorry, that is actually really important for many reasons. The underlying structure of our data has huge impact on what you can do with that data and how you can operate on it. That is why you have ArrayBuffer and friends in JavaScript now, because it matters, a lot.

And to the people whose 30 years experience never included ever needing to know those details, I have two things to say:

  • Either your code is full of O(N^2) traps (which is sadly common) or you know that, because lists vs. hash is not something that you can really get away with.
  • In this company, implementing basic data structures is literally part of day to day tasks. Over the years, we needed to get customize versions of arrays, lists, dictionaries, trees and many more. This isn’t pie in the sky stuff, that is Monday morning.
time to read 3 min | 418 words

We are hiring again (this time for Junior C# Dev positions in Israel). That means that I go through CVs (quite a few, actually).  I like going over the resumes directly, to get a feel for not just a particular candidate but what is, for lack of a better term, the state of the market.

This time, I noticed a much higher percentage of resumes with a GitHub repository link. Anytime that I see such a link, I go and look at what they have there. That is often really interesting. Then again, you run into things like this:

image

On the one hand, this is non production code, it is obviously a teaching project, which is awesome. On the other hand, I find such code painful to look at.

In the past, I would rate highly anyone that would show a GitHub account in the CV, since I could expect to see some of their projects there, usually unique ones. This time? I’m seeing a lot of basically homework assignments, and those aren’t really that interesting to review or look at. Especially since a lot of the candidates apparently had the same courses, so I saw the same 5 projects repeated over and over again.

In other words, just a GitHub account with some repositories are no longer that interesting or meaningful.

Another thing that I noticed was that a lot of those candidates had profiles with profile pictures like:

image

A small tip, if you expect people to visit your profile (and I assume you do, since you provided the link in the resume), it is worth it to put a (professional) picture of yourself there. The profiler readme on GitHub is also surprising attractive when looking at a candidate.

Another tip, if you see a position for a C# Junior Developer, it is acceptable to apply if you don’t have all the requirements, or if you exceed them. But if you are trying to find a new job as a lawyer specializing in family law, maybe don’t try to apply to a tech company.

And yes, I’m using this post as a want to vent while going over so many CVs.

Most CVs are dry, but one candidate just got bumped to the next stage based solely on the fact that in they had a “Making awesome pancakes” in the CV, which made me laugh.

time to read 4 min | 705 words

I want to comment on the following tweet:

When I read it, I had an immediate and visceral reaction. Because this is one of those things that sound nice, but is actually a horrible dystopian trap. It confused two very important concepts and put them in the wrong order, resulting in utter chaos.

The two ideas are “writing tests” and “producing high quality code”. And they are usually expressed in something like this:

We write tests in order to product high quality code.

Proper tests ensure that you can make forward progress without having regressions. They are a tool you use to ensure a certain level of quality as you move forward. If you assume that the target is the tests and that you’ll have high quality code because of that, however, you end up in weird places. For example, take a look at the following set of stairs. They aren’t going anywhere, and aside from being decorative, serves no purpose.

image

When you start considering tests themselves to be the goal, instead of a means to achieve it, you end up with decorative tests. They add to your budget and make it harder to change things, but don’t benefit the project.

There are a lot of things that you are looking for in code review that shouldn’t be in the tests. For example, consider the error handling strategy for the code. We have an invariant that says that exceptions may no escape our code when running in a background thread. That is because this will kill the process. How would you express something like that in a test? Because you end up with an error raised from a write to a file that happens when the disk is full that kills the server.

Another example is a critical piece of code that needs to be safely handle out of memory exceptions. You can test for that, sure, but it is hard and expensive. It also tend to freeze your design and implementation, because you are now testing implementation concerns and that make it very hard to change your code.

Reviewing code for performance pitfalls is also another major consideration. How do you police allocations using a test? And do that without killing your productivity? For fun, the following code allocates:

Console.WriteLine(1_000_000);

There are ways to monitor and track these kind of things, for sure, but they are very badly suited for repeatable tests.

Then there are other things that you’ll cover in the review, more tangible items. For example, the quality of error messages you raise, or the logging output.

I’m not saying that you can’t write tests for those. I’m saying that you shouldn’t. That is something that you want to be able to change and modify quickly, because you may realize that you want to add more information in a certain scenario. Freezing the behavior using tests just means that you have more work to do when you need to make the changes. And reviewing just test code is an even bigger problem when you consider that you need to consider interactions between features and their impact on one another. Not in terms of correctness, you absolutely need to test that, but in terms of behavior.

The interleaving of internal tasks inside of RavenDB was careful designed to ensure that we’ll be biased in favor of user facing operations, starving background operations if needed. At the same time, it means that we need to ensure that we’ll give the background tasks time to run. I can’t really think about how you would write a test for something like that without basically setting in stone the manner in which we make that determination. That is something that I explicitly don’t want to do, it will make changing how and what we do harder. But that is something that I absolutely consider in code reviews.

time to read 3 min | 559 words

I’m very happy to announce that we have recently released version 6.0 of Entity Framework Profiler and NHibernate Profiler. Here are some of the highlights:

image

This new version brings quite a lot to the table. We applied a lot of lessons to optimize the performance of the profiler so it can process events faster and in a more efficient manner. We also fully integrated it with the async/await model that became so popular. For Entity Framework users, we now support all versions of EF, running on .NET 4.7 all the way to .NET 5.0 and anything in between.

What I think is the crown jewels of this release, however, is the new Azure integration feature. We initially built this feature to allow you to profiler serverless code and Azure Functions, but it turned out that this is really useful.

The Profiler Azure Integration allows you to setup a storage container on Azure that would accept the profiled output from your application. Pretty simple, right? The profiler application will monitor the container and show you the events as they are written, so even if you don’t have direct access to profiler the code (very common on serverless / Azure scenarios), you can still profiler your code. That helps a lot as well when talking about running inside containers, Kubernetes, etc. The usual way the profiler and the application communicate is over a TCP channel, but given the myriad of network topologies that are now in common use, this can get complex. By utilizing Azure Integration, we can resolve the whole solution and give you a seamless way to handle profile your code.

On demand profiling of your code is just one part of the new feature. You can now do continuous profiling of your system. Because we throw the events into a blob container in Azure, and given that the data is compressed and cheap to store, you can now afford to simply record all the queries in your application and come back to it a few days or weeks later and look into interesting behaviors.

That is also a large part of why we worked on improving our performance, we expect to be dealing with a lot of data.

You are also able to specify a certain timeframe for this kind of profiling, as well as specify whatever we should remove the data after looking on it or retain it. The new profiling feature gives you a way to answer: “What queries run in production on Monday between 9:17 and 10:22 AM”. This comes with all of the usual benefits of the profiler, meaning that:

  • You can correlate each query to the exact line of code that triggered it.
  • The profiler analyze the database interaction and raise alerts on bad behaviors.

The last part is done not on a single query but with access to the full state of the system. It can allow you to find hot spots in your program, patterns of data access that leads to much higher latency and cost a lot more money.

To celebrate the new release, we are offering the profilers with 20% discount for the new quarter as well as offer a bundling option. You can get the Entity Framework Profiler or the NHibernate Profiler bundled with our Cosmos DB Profiler.

time to read 2 min | 204 words

imageRavenDB Cloud is now offering HIPAA Compliant Accounts.

HIPAA stands for Health Insurance Portability and Accountability Act and is a set of rules and regulations that health care providers and their business associates need to apply.

That refers to strictly limiting access to Personal Health Information (PHI) and Personally Identifying Information (PII) as well as audit and security requirements. In short, if you deal with medical information in the states, this is something that you need to deal with. In the rest of the world, there are similar standards and requirements.

With HIPAA compliant accounts, RavenDB Cloud takes on itself a lot of the details around ensuring that your data is stored in a safe environment and in a manner that match the HIPAA requirements. For example, the audit logs are maintained for a minimum of six years. In addition, there are further protections on accessing your cluster and we enforce a set of rules to ensure that you don’t accidently expose private data.

This feature ensures that you can easily run HIPAA compliant systems on top of RavenDB Cloud with a minimum of hassle.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. A PKI-less secure communication channel (6):
    12 Oct 2021 - Using TLS
  2. Postmortem (3):
    27 Sep 2021 - Partial RavenDB Cloud outage
  3. Production postmortem (31):
    17 Sep 2021 - The Guinness record for page faults & high CPU
  4. RavenDB 5.2 (2):
    06 Aug 2021 - Simplifying atomic cluster wide transactions
  5. re (28):
    23 Jun 2021 - The performance regression odyssey
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats