Ayende @ Rahien

filter by tags archive

architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (643) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1089) rss
raven (1457) rss
ravendb.net (542) rss
reviews (184) rss

2025
- August (1)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Jan 29 2024

A call with my wife

time to read 2 min | 267 words

Tweet Share Share 3 comments

Tags:

This happened a few minutes ago, I got a call from an unknown number. That was my wife’s work number, and she called to ask me an urgent question, it seems:

“Can you tell me how to compress a PDF file?” she asked.

For the next part, it might be better if I paint you the whole picture. Imagine bullet time, where everything slows down, and I start to analyze the question and my possible answer. The following thoughts run through my mind during that time.

PDF files are already compressed by default.
Pretty sure that the file format is already using compression.
You could strip unneeded elements from the file, removing fonts is one example, I think.
If there are images, can probably downscale or re-sample them to reduce their size.
What about just running this through Zip?
Where did this question come from?

That took about two seconds in real time. The decision tree for any possible answer here grew exponentially. I had to make a call.

“No, that isn’t easily possible,” I answered.

I got some more details as well.

“This is for uploading a document to the XYZ system, it only accepts up to 4MB files, but this PDF is 5.5MB. I guess I can just scan this document as two separate pages instead of one, right?”

A workaround found, and a detailed dive into lossless vs. lossy compression compared to the file format choice avoided, I agreed that this was probably the best option and finished my coffee, pondering the ethical dilemma of answering the actual question or the intended question.

Jan 23 2024

Meta BlogI'm a JS Developer now

time to read 9 min | 1696 words

Tweet Share Share 4 comments

Tags:

Following my previous post about updating the publishing platform of this blog, I realized that I dug myself into a hole. The new workflow was pretty sweet. To the point where I wrote my blog posts a lot more frequently than before, as you can probably tell.

The problem was that I wanted to edit and process the blog post inside Google Docs, where I have a great workflow for editing, reviews, collaboration, etc. And then I want to push that same document to the blog. The killer for me is that I want that to be a smooth process, and the end text should fit into the blog. That means, if I want to emphasize something, it should be seen in the blog as bold. And if I want to write some code, that should work as well. In fact, the reason that I started this process is that it got so annoying to post code to the blog.

I’m using Google Docs’ export functionality to get the HTML back, and I did some basic cleaning to get it blog-ready instead of being focused on visual fidelity. I was using HTML Agility Pack to do that, and it turned out to be the wrong tool for the job. The issue is that it processed the data as if it were an XML document. I actually got a lot of track record with XML, so that wasn’t the issue. The problem is that I wanted to do a series of non-trivial things with the HTML, and there aren’t any off-the-shelf facilities to do that in .NET that I could find.

For example, given how important it is to me to show code snippets properly, I wanted to be able to grab them from the document, figure out what language I’m actually using there and syntax highlight it properly. There isn’t anything like that in .NET, all the libraries I found were for JavaScript.

You know the adage about: Let’s rewrite it in Rust? I rewrote my entire publishing process to JavaScript. Which then led me to another adventure. How can I do two contrary things? When I’m writing this document, I want to be able to just write the code. When I publish it, I want to see the syntax highlighted code, properly formatted and working.

Google Docs has support for writing code blocks inline (for some small number of languages), which is great for the editing process. However, the HTML that this generates is beyond atrocious. What is even worse, in HTML, it doesn’t align things properly using fixed-sized fonts, etc. In other words, it is almost there, but not quite.

When analyzing the Google Docs output, I noticed a couple of funny characters in the code output. Here is what it looks like. I believe this is a bug in the export process, probably related to the way code blocks work in Google Docs.

Dear Googlers, if you are reading this, please make a note that this thing has just been Hyrum's Law. It is an observable state, and I’m relying on it to do important tasks. Don’t break this in the future.

It turns out these are actually a pair of Unicode characters. More specifically, they are Unicode characters that are marked for private use:

0xEC03 - appears to be used to mark the beginning of a code block
0xEC02 - appears to be used to mark the end of a code block

Note the “appears”, and my blatant disregard for things like software maintenance discipline and all things proper and good in the world of Computer Science. This is a project where there are no rules, there is one customer, and he can code 🙂.

As mentioned earlier, while extracting the Google Doc as HTML and processing it, I encounter those Unicode markers that delineate the code section. This is good, because in terms of HTML itself, what it is doing inside is a… mess. Getting the actual text as it is supposed to be is not easy. So I exported the file again, as text. Those markers are showing up in the textual edition as well, which made things a lot easier for me.

With all of this done, allow me to show you some truly horrifying beautiful code:

let blocks = [];
for (const match of text.data.matchAll(/\uEC03(.*?)\uEC02/gs)) {
    const code = match[1].trim();
    const lang = flourite(code, { shiki: true, noUnkown: true }).language;
    const formattedCode = Prism.highlight(code, Prism.languages[lang], lang);


    blocks.push("<hr/><pre class='line-numbers language-" + lang + ">" +
        "<code class='line-numbers language-" + lang + "'>" +
        formattedCode + "</code></pre><hr/>");
}


let inCodeSegment = false;
htmlDoc.findAll().forEach(e => {
    var text = e.getText().trim();
    if (text == "&#60419;") {
        e.replaceWith(blocks[codeSegmentIndex++]);
        inCodeSegment = true;
    }
    if (inCodeSegment) {
        e.extract();
    }
    if (text == "&#60418;") {
        inCodeSegment = false;
    }
})

That isn’t a lot of code, but it does plenty. We scan through the textual version of the document and find all the code blocks using a regular expression. We then try to figure out what language I’m using and apply code formatting during the publication process (this saves the need to change anything on the blog, which is nice, especially since we have to take into account syndication).

I push the code snippets into an array and then I process the actual HTML document using the DOM and find all the code snippets. I replace the start marker with the actual formatted code and continue to discard all the other elements until I hit the end of the code segment. The rest of the code remains pretty much the same as before.

I was writing this in VS Code and copilot suggested the following code for handling images:

htmlDoc.findAll('img').forEach(img => {
    if (img.attrs.hasOwnProperty('src')) {
        let src = img.attrs.src;
        let imgName = src.split('/').pop();
        let imgData = entries.find(e => e.entryName === 'images/' + imgName).getData();
        let imgType = imgName.split('.').pop();
        let imgSrc = 'data:image/' + imgType + ';base64,' + imgData.toString('base64');
        img.replaceWith('<img src="' + imgSrc + '" style="float: right"/>');
    }
})

In other words, instead of uploading the images as separate files, I can just encode them into the blog post directly. I like that idea very much because it means that I don’t have to store the images elsewhere.

Given that I don’t have any npm packages to abandon, I don’t know if I can call myself a JavaScript developer, but I did put the full code up for people to take a peek and then recoil.

Jan 22 2024

Non fungible money in cloud accounting

time to read 14 min | 2727 words

Tweet Share Share 2 comments

Tags:

Fungible is a funny word, mostly because you are most likely familiar with the term from NFT (non-fungible tokens) and other similar scams. At its core, it is the idea that for certain things, the instance doesn’t matter, just the amount.

The classic example is that if I lend you a 50$ bill, and you give me back two 20$ bills and a 10$ bill, you’ve still given me back my money. That is even though you very clearly didn’t. I didn’t get the same physical 50$ paper bill back, I got bills for that same amount. On the other hand, if I give you my dog for the weekend, I would be quite upset if I got back three different dogs, even if the total weight is the same.

This is actually a lot more than I want to know about fungibility, to be honest. But it turns out that if you are running a cloud business or just use the cloud in general, you have to be well-versed in the matter. Because in the cloud, money isn’t fungible. In fact, it doesn’t behave a lot like money at all.

Let’s assume that we are a cloud company called cloud.example.com, offering VPS for ourr users. You are in charge of writing the billing code, and it is pretty simple, right? Here is some code that can compute the charges:

function compute_charges(custId, start, end) {
  let total = 0;
  let predicate = instance =>
    (instance.custId === custId  && instance.started < end) &&
    (instance.ended > start      || instance.ended == null);


  for (let instance of query_instances(predicate)) {
    total += instance.hours_running(start, end) *
             instance.price_per_hour;
  }


  return total;
}

As you can see, there isn’t much there. We find all the instances that were running in the billing period and then calculate the total hours they ran during that period. Please note, this is a simplified model as we aren’t dealing with stopping & starting instances, etc.

The output of the compute_charges() function is a number, which will presumably be handed over to be charged over a credit card. There are other things that we need to do as well (generate an invoice, have a usage report, etc), but I want to focus on the money issue here.

The simplest model is that at the end of the billing period, we charge the customer (using a credit card, for example) and receive our payment. Everyone is happy and we can go home, hopefully richer.

The challenge arises when we want to offer additional options to the customer. For example, we may be willing to give the customer a discount if they are going to commit to a minimum amount of money they’ll spend each month. We may want to offer them upfront payment options or give monetary incentives to a particular aspect of the business (run on ARM instances instead of X64, for example).

Each time that we make such an offer, we are going to be turning around and (significantly) complicating the way we bill the customer. Let’s talk about something as simple as committing to run an instance for a whole year. No upfront payment, just a commitment to pay for a particular server for a year. In AWS or Azure, that would be Reserved Instances, so you are likely very familiar with the idea.

How is that going to be expressed in code? Probably something like this:

function compute_charges(custId, start, end) {
    let total = 0;
    let predicate = instance => /*..redacted.*/;
   
    var hrsPerIns = {};
    for (let i of this.instances(predicate)) {
        let hours = i.hours_running(start, end);
        hrsPerIns[i.type] = hours + (hrsPerIns[i.type] || 0);
        total += hours * i.price_per_hour;
    }


    for (let c of this.commitmentsFor(custId, start, end)) {
        let hours = c.committed_time(start, end);
        let hoursUsed = hrsPerIns[c.type] || 0;
        let unusedCommittedHours = Math.max(0, hours - hoursUsed);
        total += unusedCommittedHours *
                this.instance(c.type).price_per_hour;
    }
 
    return total;
  }

To be clear, the code above is not a good way to handle such a task, but it does show in a pretty succinct way the hidden complexities. In this case, if you didn’t meet your commitment, we’ll charge you for the unused commitment as well.

A more complex system would have to account for discounted rates while using the committed values, for example. And in that case, the priority of applying such rates between different matching commitments.

Other aspects may be giving the user a discount for a particular level of usage. So the first 100GB are priced differently from the rest, applying a free tier and… you get the point, I think. It gets complex.

Note that at this point, we aren’t even talking about money yet, we are discussing computing the charges. The situation is more interesting when we move to the next stage. On the face of it, this seems pretty simple, all you need to do is charge the credit card, no?

Okay, maybe you need to send an invoice, but that is about it, right?

Well… what happens if the customer made an upfront payment for one of those commitments? Or just accidentally paid twice last month and now has credit on your system.

I’m going to leave aside the whole complexity around payments bouncing (which is a whole other interesting topic) and how to deal with the actual charging. Right now I want to focus on the nature of money itself.

Imagine you have a commitment with a customer for an 8-core / 64 GB VPS server for a whole year. And they paid upfront, getting a nice discount along the way. How would you record that in your system?

The easiest is to create the notion of credit for the user, which you deduct whenever you need to charge them. So we’ll first compute the charges, then deduct the existing credits, and debit the customer if anything remains. This is simple, easy to work with, and wrong.

Remember that discount the user received? They paid for that particular VPS type, and if you now need to charge them for anything else (such as storage charges), that money cannot come from the funds paid for the VPS.

In other words, the money the customer paid is not fungible. It isn’t applicable for any charge, it is colored. It is dedicated to a particular purpose. And managing that turns out to be pretty complex. Mostly because we are trying to fit everything into the debits and credits on the account.

A better model is to avoid using money, in the same way that if you mix inches and centimeters you’ll eventually end up in a bad place on Mars. The solution is to treat each individual charge as its own “currency”.

In other words, when computing the charges, we aren’t trying to find the cost of running a particular instance for the billing period. We are trying to find how many “cost units” we have for that time period.

Instead of getting a single number that we’ll charge the customer, we’ll obtain a detailed set of the changes in question. Not as money, but as cost units. Think about those in a similar way to currency. Note that all the units are multiples of 730 hours (number of hours per month, on average).

compute_charges(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    costs: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty:  730},
      {type: 'disk-5000-iops',     qty: 2920},
    ],
}

The next step after that is to get your allocated budget for the same billing period, which will look something like this:

compute_budget(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    commitments: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty: 1460},
      {type: 'disk-5000-iops',     qty: 730},
    ],
}

In other words, just as we compute the charges based on the actual usage for that billing period, we apply the same approach on the commitments we have. The next stage is to just add all of those together. In this case, we’ll end up with the following:

8Cores-64GB-hours ⇒ 0 (we used as much as we committed to)
4Cores-32GB-hours ⇒ -730 (we committed to more than we used)
Disk-5000-iops ⇒ 2190 (remaining use after applying commitment, priced as you go)

We aren’t done yet, after commitments, there are other plans that we may need to run. For example, we’ll provide you with some global discounts for VM rental (which doesn’t apply to disks, however). Working at the level of cost units (or colors, or currency, whatever term you like) allows us to apply those things in a very fine-grained manner. More importantly, the end result and all its intermediate steps are very clear. That is quite important when you look at a six-figure bill with hundreds of line items and you want to see whether the billing matches your contract or not.

As you can imagine, given the inherent complexity of the system, being able to clearly “show your work” is quite important. Especially when there is a misunderstanding or questions are being raised (and there will be).

What we have done now is compute the actual charges based on their type, but we need to convert that to real money. There are several steps along this process:

We need to charge all the active commitments. Those may have been pre-paid (in which case there is no current charge), but they may have a (fixed) monthly cost that we need to add to the current invoice.
We need to perform a “currency conversion” between the units we have and actual money. In the example above, we have a negative number of units (for 4Cores-32GB-hours), as we committed to more hours than we actually used. We are still being charged for this by applying the rate from the commitment.
On the other hand, when we examine the disk costs, we used more than we committed to. Here we need to make a decision about what price we’ll charge the user. It can be the commitment price or the pay-as-you-go price. So even for the same currency we may have different rules.

After all of this is done, we are now left with a final number. The actual amount of money that we need to charge the customer. This is the point at which we check if the customer has any credit already paid in the system or if we need to make an actual charge. That aspect is complicated by whether you are charging a credit card (same for any other automatic billing option) or issuing an invoice to be paid manually.

For a manual invoice, you now have a whole other process. For example, you may offer discounts for the customer if they pay within 14 days versus the usual 30, or charge a fee for paying within 60 days, etc.

I’m not touching on collections or what to do when you fail to charge the customer. It is shockingly common to encounter payment failures. To the point where we never had a single payment run that didn’t include at least several such cases. The reasons range from deal size too big to (temporary) lack of funds to suspicious-seeming activity. You need to be able to handle that as well. But those are topics for another post.

In this post, my aim was to discuss just the issue of the complexity of money in the cloud business. I find the model of treating the charges as separate “currencies” to be a nice one overall, but I would love to hear about other people’s experiences in this matter.

Jan 19 2024

Code review & Time Travel

time to read 4 min | 672 words

Tweet Share Share 2 comments

Tags:

A not insignificant part of my job is to go over code. Today I want to discuss how we approach code reviews at RavenDB, not from a process perspective but from an operational one. I have been a developer for nearly 25 years now, and I’ve come to realize that when I’m doing a code review I’m actually looking at the code from three separate perspectives.

The first, and most obvious one, is when I’m actually looking for problems in the code - ensuring that I can understand what is going on, confirming the flow makes sense, etc. This involves looking at the code as it is right now.

I’m going to be showing snippets of code reviews here. You are not actually expected to follow the code, only the concepts that we talk about here.

Here is a classic code review comment:

There is some duplicated code that we need to manage. Another comment that I liked is this one, pointing out a potential optimization in the code:

If we define the code using the static keyword, we’ll avoid delegate allocation and save some memory, yay!

It gets more interesting when the code is correct and proper, but may do something weird in some cases, such as in this one:

I really love it when I run into those because they allow me to actually explore the problem thoroughly. Here is an even better example, this isn’t about a problem in the code, but a discussion on its impact.

RavenDB has been around for over 15 years, and being able to go back and look at those conversations in a decade or so is invaluable to understanding what is going on. It also ensures that we can share current knowledge a lot more easily.

Speaking of long running-projects, take a look at the following comment:

Here we need to provide some context to explain. The _caseInsensitive variable here is a concurrent dictionary, and the change is a pretty simple optimization to avoid the annoying KeyValuePair overload. Except… this code is there intentionally, we use it to ensure that the removal operation will only succeed if both the key and the value match. There was an old bug that happened when we removed blindly and the end result was that an updated value was removed.

In this case, we look at the code change from a historical perspective and realize that a modification would reintroduce old (bad) behavior. We added a comment to explain that in detail in the code (and there already was a test to catch it if this happens again).

By far, the most important and critical part of doing code reviews, in my opinion, is not focusing on what is or what was, but on what will be. In other words, when I’m looking at a piece of code, I’m considering not only what it is doing right now, but also what we’ll be doing with it in the future.

Here is a simple example of what I mean, showing a change to a perfectly fine piece of code:

The problem is that the if statement will call InitializeCmd(), but we previously called it using a different condition. We are essentially testing for the same thing using two different methods, and while currently we end up with the same situation, in the future we need to be aware that this may change.

I believe one of the major shifts in my thinking about code reviews came about because I mostly work on RavenDB, and we have kept the project running over a long period of time. Focusing on making sure that we have a sustainable and maintainable code base over the long haul is important. Especially because you need to experience those benefits over time to really appreciate looking at codebase changes from a historical perspective.

Dec 22 2023

Learning over the holidays: Yet Another Bug Tracker sample app

time to read 2 min | 235 words

Tweet Share Share 5 comments

Tags:

If you are reading this blog, I assume that you are a like-minded person. My idea of relaxation is to sit and write code. Hopefully on something that I’m not familiar with. I have many such blog post series covering topics I care about. It’s my idea of meditation.

For the end of 2023, I thought that we could do something similar but on a broader scale. A while ago Alex Klaus wrote a walkthrough on how to build a complete application from scratch using modern best practices (and RavenDB). We refreshed the code and made it widely available, offering you something fun , educational, and productive to engage with.

The system is a bug tracker (allowing us to focus on the architecture rather than domain concerns), and you can play with a deployed version live. The code is available under the MIT license, and we’ll be very happy to receive any suggested improvements.

Topics that are covered:

As usual, I would love any feedback you have to offer.

Oct 13 2023

ChallengeFastest node selection metastable error state–answer

time to read 2 min | 290 words

Tweet Share Share 1 comments

Tags:

In the previous post, I showed a very simple request router that would always select the fastest node. That worked for a long while, until it didn’t, and the challenge is figuring out why.

As it turns out, the issue is a simple one of spooky action at a distance. Here is what happens. Assume that we have three servers and 10 clients. Each server is sized to handle 4 clients. So far, so good, the system has the capacity to spare.

The problem is in the manner in which clients will detect which is the fastest node in the cluster. The only thing that is considered is the state of the node at the time of selection. At that time, we may end up with all the nodes selecting one particular node as the fastest.

In other words, we have three servers, two of them have no clients talking to them and one of the servers has all the clients talking to it. That results in that node going down, obviously. The clients would then react appropriately, and select a new node to talk to. All of them would do that, find the fastest node, and… bring it down as well. Rinse & repeat.

The issue can be stated as Time Of Check vs Time Of Use, but also as a race condition, where all individual nodes end up doing a synchronized “wave” operation that kills the system.

How do you prevent this?

You introduce randomness into the system. You don’t test the status once, but re-check on a regular basis so you can respond to shifting load. You should also introduce randomness into the process. So the nodes won’t all do this exactly at the same time and end up in the same position.

Oct 12 2023

ChallengeFastest node selection metastable error state

time to read 1 min | 186 words

Tweet Share Share 12 comments

Tags:

Side note: Current state in Israel right now is bad. I’m writing this blog post as a form of escapism so I can talk about something that makes sense and follow logic and reason. I’ll not comment on the current status otherwise in this area.

Consider the following scenario. We have a bunch of servers and clients. The clients want to send requests for processing to the fastest node that they have available. But the algorithm that was implemented has an issue, can you see what this is?

To simplify things, we are going to assume that the work that is being done for each request is the same, so we don’t need to worry about different request workloads.

The idea is that each client node will find the fastest node (usually meaning the nearest one) and if there is enough load on the server to have it start throwing errors, it will try to find another one. This system has successfully spread the load across all servers, until one day, the entire system went down. And then it stayed down.

Can you figure out what is the issue?

Sep 19 2023

ChallengeSpot the bug

time to read 1 min | 27 words

Tweet Share Share 5 comments

Tags:

The following bug cost me a bunch of time, can you see what I’m doing wrong?

For fun, it’s so nasty because usually, it will accidentally work.

Sep 15 2023

Filtering negative numbers, fastBeating memcpy()

time to read 9 min | 1686 words

Tweet Share Share 3 comments

Tags:

In the previous post, I was able to utilize AVX to get some nice speedups. In general, I was able to save up to 57%(!) of the runtime in processing arrays of 1M items. That is really amazing, if you think about it. But my best effort only gave me a 4% improvement when using 32M items.

I decided to investigate what is going on in more depth, and I came up with the following benchmark. Given that I want to filter negative numbers, what would happen if the only negative number in the array was the first one?

In other words, let’s see what happens when we could write this algorithm as the following line of code:

array[1..].CopyTo(array);

The idea here is that we should measure the speed of raw memory copy and see how that compares to our code.

Before we dive into the results, I want to make a few things explicit. We are dealing here with arrays of long, when I’m talking about an array with 1M items, I’m actually talking about an 8MB buffer, and for the 32M items, we are talking about 256MB of memory.

I’m running these benchmarks on the following machine:

    AMD Ryzen 9 5950X 16-Core Processor

    Base speed:    3.40 GHz
     L1 cache:    1.0 MB
     L2 cache:    8.0 MB
     L3 cache:    64.0 MB

    Utilization    9%
     Speed    4.59 GHz

In other words, when we look at this, the 1M items (8MB) can fit into L2 (barely, but certainly backed by the L3 cache. For the 32M items (256MB), we are far beyond what can fit in the cache, so we are probably dealing with memory bandwidth issues.

I wrote the following functions to test it out:

Let’s look at what I’m actually testing here.

CopyTo() – using the span native copying is the most ergonomic way to do things, I think.
MemoryCopy() – uses a built-in unsafe API in the framework. That eventually boils down to a heavily optimized routine, which… calls to Memove() if the buffer overlaps (as they do in this case).
MoveMemory() – uses a pinvoke to call to the Windows API to do the moving of memory for us.

Here are the results for the 1M case (8MB):

Method	N	Mean	Error	StdDev	Ratio
FilterCmp	1048599	441.4 us	1.78 us	1.58 us	1.00
FilterCmp_Avx	1048599	141.1 us	2.70 us	2.65 us	0.32
CopyTo	1048599	872.8 us	11.27 us	10.54 us	1.98
MemoryCopy	1048599	869.7 us	7.29 us	6.46 us	1.97
MoveMemory	1048599	126.9 us	0.28 us	0.25 us	0.29

We can see some real surprises here. I’m using the FilterCmp (the very basic implementation) that I wrote.

I cannot explain why CopyTo() and MemoryMove() are so slow.

What is really impressive is that the FilterCmp_Avx() and MoveMemory() are so close in performance, and so much faster. To put it in another way, we are already at a stage where we are within shouting distance from the MoveMemory() performance. That is.. really impressive.

That said, what happens with 32M (256MB) ?

Method	N	Mean	Error	StdDev	Ratio
FilterCmp	33554455	22,763.6 us	157.23 us	147.07 us	1.00
FilterCmp_Avx	33554455	20,122.3 us	214.10 us	200.27 us	0.88
CopyTo	33554455	27,660.1 us	91.41 us	76.33 us	1.22
MemoryCopy	33554455	27,618.4 us	136.16 us	127.36 us	1.21
MoveMemory	33554455	20,152.0 us	166.66 us	155.89 us	0.89

Now we are faster in the FilterCmp_Avx than MoveMemory. That is… a pretty big wow, and a really nice close for this blog post series, right? Except that we won’t be stopping here.

The way the task I set out works, we are actually filtering just the first item out, and then we are basically copying the memory. Let’s do some math: 256MB in 20.1ms means 12.4 GB/sec!

On this system, I have the following memory setup:

    64.0 GB

    Speed:    2133 MHz
     Slots used:    4 of 4
     Form factor:    DIMM
     Hardware reserved:    55.2 MB

I’m using DDR4 memory, so I can expect a maximum speed of 17GB/sec. In theory, I might be able to get more if the memory is located on different DIMMs, but for the size in question, that is not likely.

I’m going to skip the training montage of VTune, understanding memory architecture and figuring out what is actually going on.

Let’s drop everything and look at what we have with just AVX vs. MoveMemory:

Method	N	Mean	Error	StdDev	Median	Ratio
FilterCmp_Avx	1048599	141.6 us	2.28 us	2.02 us	141.8 us	1.12
MoveMemory	1048599	126.8 us	0.25 us	0.19 us	126.8 us	1.00

FilterCmp_Avx	33554455	21,105.5 us	408.65 us	963.25 us	20,770.4 us	1.08
MoveMemory	33554455	20,142.5 us	245.08 us	204.66 us	20,108.2 us	1.00

The new baseline is MoveMemory, and in this run, we can see that the AVX code is slightly slower.

It’s sadly not uncommon to see numbers shift by those ranges when we are testing such micro-optimizations, mostly because we are subject to so many variables that can affect performance. In this case, I dropped all the other benchmarks, which may have changed things.

At any rate, using those numbers, we have 12.4GB/sec for MoveMemory() and 11.8GB/sec for the AVX version. The hardware maximum speed is 17GB/sec. So we are quite close to what can be done.

For that matter, I would like to point out that the trivial code completed the task in 11GB/sec, so that is quite respectable and shows that the issue here is literally getting the memory fast enough to the CPU.

Can we do something about that? I made a pretty small change to the AVX version, like so:

What are we actually doing here? Instead of loading the value and immediately using it, we are loading the next value, then we are executing the loop and when we iterate again, we will start loading the next value and process the current one. The idea is to parallelize load and compute at the instruction level.

Sadly, that didn’t seem to do the trick. I saw a 19% additional cost for that version compared to the vanilla AVX one on the 8MB run and a 2% additional cost on the 256MB run.

I then realized that there was one really important test that I had to also make, and wrote the following:

In other words, let's test the speed of moving memory and filling memory as fast as we possibly can. Here are the results:

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Code Size
MoveMemory	1048599	126.8 us	0.36 us	0.33 us	0.25	0.00	270 B
FillMemory	1048599	513.5 us	10.05 us	10.32 us	1.00	0.00	351 B

MoveMemory	33554455	20,022.5 us	395.35 us	500.00 us	1.26	0.02	270 B
FillMemory	33554455	15,822.4 us	19.85 us	17.60 us	1.00	0.00	351 B

This is really interesting, for a small buffer (8MB), MoveMemory is somehow faster. I don’t have a way to explain that, but it has been a pretty consistent result in my tests.

For the large buffer (256MB), we are seeing results that make more sense to me.

MoveMemory – 12.5 GB / sec
FIllMemory – 15.8 GB / sec

In other words, for MoveMemory, we are both reading and writing, so we are paying for memory bandwidth in both directions. For filling the memory, we are only writing, so we can get better performance (no need for reads).

In other words, we are talking about hitting the real physical limits of what the hardware can do. There are all sorts of tricks that one can pull, but when we are this close to the limit, they are almost always context-sensitive and dependent on many factors.

To conclude, here are my final results:

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Code Size
FilterCmp_Avx	1048599	307.9 us	6.15 us	12.84 us	0.99	0.05	270 B
FilterCmp_Avx_Next	1048599	308.4 us	6.07 us	9.26 us	0.99	0.03	270 B
CopyTo	1048599	1,043.7 us	15.96 us	14.93 us	3.37	0.11	452 B
ArrayCopy	1048599	1,046.7 us	15.92 us	14.89 us	3.38	0.14	266 B
UnsafeCopy	1048599	309.5 us	6.15 us	8.83 us	1.00	0.04	133 B
MoveMemory	1048599	310.8 us	6.17 us	9.43 us	1.00	0.00	270 B

FilterCmp_Avx	33554455	24,013.1 us	451.09 us	443.03 us	0.98	0.02	270 B
FilterCmp_Avx_Next	33554455	24,437.8 us	179.88 us	168.26 us	1.00	0.01	270 B
CopyTo	33554455	32,931.6 us	416.57 us	389.66 us	1.35	0.02	452 B
ArrayCopy	33554455	32,538.0 us	463.00 us	433.09 us	1.33	0.02	266 B
UnsafeCopy	33554455	24,386.9 us	209.98 us	196.42 us	1.00	0.01	133 B
MoveMemory	33554455	24,427.8 us	293.75 us	274.78 us	1.00	0.00	270 B

As you can see, just the AVX version is comparable or (slightly) beating the MoveMemory function.

I tried things like prefetching memory, both the next item, the next cache item and from the next page, using non-temporal load and stores and many other things, but this is a pretty tough challenge.

What is really interesting is that the original, very simple and obvious implementation, clocked at 11 GB/sec. After pulling pretty much all the stops and tricks, I was able to hit 12.5 GB / sec.

I don’t think anyone can look / update / understand the resulting code in any way without going through deep meditation. That is not a bad result at all. And along the way, I learned quite a bit about how the lowest level of the machine architecture is working.

Aug 14 2023

The role of GitHub in paying for Open Source Software

time to read 7 min | 1203 words

Tweet Share Share 2 comments

Tags:

I have been doing Open Source work for just under twenty years at this point. I have been paying my mortgage from Open Source software for about 15. I’m stating that to explain that I have spent quite a lot of time struggling with the inherent tension between having an Open Source project and getting paid.

I wrote about it a few times in the past. It is not a trivial problem, and the core of the issue is not something that you can easily solve with technical means. I ran into this fascinating thread on Twitter that over the weekend:

you just described licensing. As you missed 1 important aspect: if an org isn't obligated to pay, they won't. So you need a form of making them pay by giving them a token they paid which then makes them able to use your software. Any other form will fail.
— Frans Bouma (@FransBouma) August 11, 2023

And another part of that is here:

The thing is, businesses spend significant amounts of money on software licenses, whether on-prem or as-a-service.

They understand and accept this, as do their shareholders and investors. It is a cost of doing business.

Donations? Not so much.
— Udi Dahan (@UdiDahan) August 11, 2023

I’m quoting the most relevant pieces, but the idea is pretty simple.

Donations don’t work, period. They don’t work not because companies are evil or developers don’t want to pay for Open Source. They don’t work because it takes a huge amount of effort to actually get paid.

If you are an independent developer, your purchasing process goes something like this:

I would like to use this thing
I need to pay for that
The price matches the value I’m getting
Where is my credit card…
Paid!

Did you note step 2? The part about needing to pay?

If you don’t have that step, what will happen? Same scenario, an independent developer:

I would like to use this thing
I use this thing
It would be great to pay something to show my appreciation
Where did I put the credit card? Oh, it’s down the hall… I’ll get to that later (never).

That is in the best-case scenario where the thought of donating actually crossed your mind. In most likelihood, the process is more:

I would like to use this thing
I use this thing
Ticket closed, what is the next one… ?

Now, what happens if you are not an independent developer? Let’s say that you are a contract worker for a company. You need to talk to your contact person, they will need to get purchasing approval. Depending on the amount, that may require escalating upward a few levels, etc.

Let’s say that the amount is under 100$, so basically within the budgetary discretion of the first manager you run into. They would still need to know what they are paying for, what they are getting out of that (they need to justify that). If this is a donation, welcome to the beauty of tax codes in multiple jurisdictions and what counts as such. If this is not a donation, what do they get? That means that you now have to do a meeting, potentially multiple ones. Present your case, open a new supplier at the company, etc.

The cost of all of those is high, both in time and money. Or… you can just nuget add-package and move on.

In the case of RavenDB, it is an Open Source software (a license to match, code is freely available), but we treat it as a commercial project for all intents and purposes. If you want to install RavenDB, you’ll get a popup saying you need a license, directing you to a page where you see how much we would like to get and what do you get in return, etc. That means that from a commercial perspective, we are in a familiar ground for companies. They are used to paying for software, and there isn’t an option to just move on to the next task.

There is another really important consideration here. In the ideal Open Source donation model, money just shows up in your account. In the commercial world, there is a huge amount of work that is required to get things done. That is when you have a model where “the software does not work without a purchase”. To give some context, 22% is Sales & Marketing and they spent around 21.8 billion in 2022 on Sales & Marketing. That is literally billions being spent to make sales.

If you want to make money, you are going to invest in sales, sales strategy, etc. I’m ignoring marketing here because if you are expected to make money from Open Source, you likely already have a project well-known enough to at least get started.

That means that you need to figure out what you are charging for, how do you get customers, etc. In the case of RavenDB, we use the per-core model, which is a good indication of how much use the user is getting from RavenDB. LLBLGen Pro, on the other hand, they are charging per seat. Particular’s NServiceBus uses a per endpoint / number of messages a day model.

There is no one model that fits all. And you need to be able to tailor your pricing model to how your users think about your software.

So pricing strategy, creating a proper incentive to purchase (hard limit, usually) and some sales organization to actually drive all of that are absolutely required.

Notice what is missing here? GitHub. It simply has no role at all up to this point. So why the title of this post?

There is one really big problem with getting paid that GitHub can solve for Open Source (and in general, I guess).

The whole process of actually getting paid is absolutely atrocious. In the best case, you need to create a supplier at the customer, fill up various forms (no, we don’t use child labor or slaves, indeed), figure out all sorts of weird roles (German tax authority requires special dispensation, and let’s not talk about getting paid from India, etc). Welcome to Anti Money Laundering roles and GDPR compliance with Known Your Customer and SOC 2 regulations. The last sentence is basically nonsense words, but I understand that if you chant it long enough, you get money in the end.

What GitHub can do is be a payment pipe. Since presumably your organization is already set up with them in place, you can get them to do the invoicing, collecting the payment, etc. And in the end, you get the money.

That sounds exactly like GitHub Sponsorships, right? Except that in this case, this is no a donation. This is a flat-out simple transaction, with GitHub as the medium. The idea is that you have a limit, which you enforce, on your usage, and GitHub is how you are paid. The ability to do it in this fashion may make things easier, but I would assume that there are about three books worth of regulations and EULAs to go through to make it actually successful.

Yet, as far as I’m concerned, that is really the only important role that we have for GitHub here.

That is not a small thing, mind. But it isn’t a magic bullet.

Oren Eini

Oren Eini

CEO of RavenDB

A call with my wife

Meta BlogI'm a JS Developer now

Non fungible money in cloud accounting

Code review & Time Travel

Learning over the holidays: Yet Another Bug Tracker sample app

ChallengeFastest node selection metastable error state–answer

ChallengeFastest node selection metastable error state

ChallengeSpot the bug

Filtering negative numbers, fastBeating memcpy()

The role of GitHub in paying for Open Source Software

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed