Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,461 | Comments: 50,984

Privacy Policy Terms
filter by tags archive
time to read 2 min | 380 words

I was looking into reducing the allocation in a particular part of our code, and I ran into what was basically the following code (boiled down to the essentials):

As you can see, this does a lot of allocations. The actual method in question was a pretty good size, and all those operations happened in different locations and weren’t as obvious.

Take a moment to look at the code, how many allocations can you spot here?

The first one, obviously, is the string allocation, but there is another one, inside the call to GetBytes(), let’s fix that first by allocating the buffer once (I’m leaving aside the allocation of the reusable buffer, you can assume it is big enough to cover all our needs):

For that matter, we can also easily fix the second problem, by avoiding the string allocation:

That is a few minutes of work, and we are good to go. This method is called a lot, so we can expect a huge reduction in the amount of memory that we allocated.

Except… that didn’t happen. In fact, the amount of memory that we allocate remained pretty much the same. Digging into the details, we allocate roughly the same number of byte arrays (how!) and instead of allocating a lot of strings, we now allocate a lot of character arrays.

I broke the code apart into multiple lines, which made things a lot clearer. (In fact, I threw that into SharpLab, to be honest). Take a look:

This code: buffer[..len] is actually translated to:

char[] charBuffer= RuntimeHelpers.GetSubArray(buffer, Range.EndAt(len));

That will, of course, allocate. I had to change the code to be very explicit about the types that I wanted to use:

This will not allocate, but if you note the changes in the code, you can see that the use of var in this case really tripped me up. Because of the number of overloads and automatic coercion of types that didn’t happen.

For that matter, note that any slicing on arrays will generate a new array, including this code:

This makes perfect sense when you realize what is going on and can still be a big surprise, I looked at the code a lot before I figured out what was going on, and that was with a profiler output that pinpointed the fault.

time to read 7 min | 1203 words

I have been doing Open Source work for just under twenty years at this point. I have been paying my mortgage from Open Source software for about 15.  I’m stating that to explain that I have spent quite a lot of time struggling with the inherent tension between having an Open Source project and getting paid.

I wrote about it a few times in the past. It is not a trivial problem, and the core of the issue is not something that you can easily solve with technical means. I ran into this fascinating thread on Twitter that over the weekend:

And another part of that is here:

I’m quoting the most relevant pieces, but the idea is pretty simple.

Donations don’t work, period. They don’t work not because companies are evil or developers don’t want to pay for Open Source. They don’t work because it takes a huge amount of effort to actually get paid.

If you are an independent developer, your purchasing process goes something like this:

  1. I would like to use this thing
  2. I need to pay for that
  3. The price matches the value I’m getting
  4. Where is my credit card…
  5. Paid!

Did you note step 2? The part about needing to pay?

If you don’t have that step, what will happen? Same scenario, an independent developer:

  1. I would like to use this thing
  2. I use this thing
  3. It would be great to pay something to show my appreciation
  4. Where did I put the credit card? Oh, it’s down the hall… I’ll get to that later (never).

That is in the best-case scenario where the thought of donating actually crossed your mind. In most likelihood, the process is more:

  1. I would like to use this thing
  2. I use this thing
  3. Ticket closed, what is the next one… ?

Now, what happens if you are not an independent developer? Let’s say that you are a contract worker for a company. You need to talk to your contact person, they will need to get purchasing approval. Depending on the amount, that may require escalating upward a few levels, etc.

Let’s say that the amount is under 100$, so basically within the budgetary discretion of the first manager you run into. They would still need to know what they are paying for, what they are getting out of that (they need to justify that). If this is a donation, welcome to the beauty of tax codes in multiple jurisdictions and what counts as such. If this is not a donation, what do they get? That means that you now have to do a meeting, potentially multiple ones. Present your case, open a new supplier at the company, etc.

The cost of all of those is high, both in time and money. Or… you can just nuget add-package and move on.

In the case of RavenDB, it is an Open Source software (a license to match, code is freely available), but we treat it as a commercial project for all intents and purposes. If you want to install RavenDB, you’ll get a popup saying you need a license, directing you to a page where you see how much we would like to get and what do you get in return, etc. That means that from a commercial perspective, we are in a familiar ground for companies.  They are used to paying for software, and there isn’t an option to just move on to the next task.

There is another really important consideration here. In the ideal Open Source donation model, money just shows up in your account. In the commercial world, there is a huge amount of work that is required to get things done. That is when you have a model where “the software does not work without a purchase”.  To give some context, 22% is Sales & Marketing and they spent around 21.8 billion in 2022 on Sales & Marketing. That is literally billions being spent to make sales.

If you want to make money, you are going to invest in sales, sales strategy, etc. I’m ignoring marketing here because if you are expected to make money from Open Source, you likely already have a project well-known enough to at least get started.

That means that you need to figure out what you are charging for, how do you get customers, etc. In the case of RavenDB, we use the per-core model, which is a good indication of how much use the user is getting from RavenDB. LLBLGen Pro, on the other hand, they are charging per seat. Particular’s NServiceBus uses a per endpoint / number of messages a day model.

There is no one model that fits all. And you need to be able to tailor your pricing model to how your users think about your software.

So pricing strategy, creating a proper incentive to purchase (hard limit, usually) and some sales organization to actually drive all of that are absolutely required.

Notice what is missing here? GitHub. It simply has no role at all up to this point. So why the title of this post?

There is one really big problem with getting paid that GitHub can solve for Open Source (and in general, I guess).

The whole process of actually getting paid is absolutely atrocious. In the best case, you need to create a supplier at the customer, fill up various forms (no, we don’t use child labor or slaves, indeed), figure out all sorts of weird roles (German tax authority requires special dispensation, and let’s not talk about getting paid from India, etc). Welcome to Anti Money Laundering roles and GDPR compliance with Known Your Customer and SOC 2 regulations. The last sentence is basically nonsense words, but I understand that if you chant it long enough, you get money in the end.

What GitHub can do is be a payment pipe. Since presumably your organization is already set up with them in place, you can get them to do the invoicing, collecting the payment, etc. And in the end, you get the money.

That sounds exactly like GitHub Sponsorships, right? Except that in this case, this is no a donation. This is a flat-out simple transaction, with GitHub as the medium. The idea is that you have a limit, which you enforce, on your usage, and GitHub is how you are paid. The ability to do it in this fashion may make things easier, but I would assume that there are about three books worth of regulations and EULAs to go through to make it actually successful.

Yet, as far as I’m concerned, that is really the only important role that we have for GitHub here.

That is not a small thing, mind. But it isn’t a magic bullet.

time to read 3 min | 533 words

Measuring the length of time that a particular piece of code takes is a surprising challenging task. There are two aspects to this, the first is how do you ensure that the cost of getting the start and end times won’t interfere with the work you are doing. The second is how to actually get the time (potentially many times a second) in as efficient way as possible.

To give some context, Andrey Akinshin does a great overview of how the Stopwatch class works in C#. On Linux, that is basically calling to the clock_gettime system call, except that this is not a system call. That is actually a piece of code that the Kernel sticks inside your process that will then integrate with other aspects of the Kernel to optimize this. The idea is that this system call is so frequent that you cannot pay the cost of the Kernel mode transition. There is a good coverage of this here.

In short, that is a very well-known problem and quite a lot of brainpower has been dedicated to solving it. And then we reached this situation:

image

What you are seeing here is us testing the indexing process of RavenDB under the profiler. This is indexing roughly 100M documents, and according to the profiler, we are spending 15% of our time gathering metrics?

The StatsScope.Start() method simply calls Stopwatch.Start(), so we are basically looking at a profiler output that says that Stopwatch is accounting for 15% of our runtime?

Sorry, I don’t believe that. I mean, it is possible, but it seems far-fetched.

In order to test this, I wrote a very simple program, which will generate 100K integers and test whether they are prime or not. I’m doing that to test compute-bound work, basically, and testing calling Start() and Stop() either across the whole loop or in each iteration.

I run that a few times and I’m getting:

  • Windows: 311 ms with Stopwatch per iteration and 312 ms without
  • Linux: 450 ms with Stopwatch per iteration and 455 ms without

On Linux, there is about 5ms overhead if we use a per iteration stopwatch, on Windows, it is either the same cost or slightly cheaper with per iteration stopwatch.

Here is the profiler output on Windows:

image

And on Linux:

image

Now, that is what happens when we are doing a significant amount of work, what happens if the amount of work is negligible? I made the IsPrime() method very cheap, and I got:

image

So that is a good indication that this isn’t free, but still…

Comparing the costs, it is utterly ridiculous that the profiler says that so much time is spent in those methods.

Another aspect here may be the issue of the profiler impact itself. There are differences between using Tracing and Sampling methods, for example.

I don’t have an answer, just a lot of very curious questions.

time to read 1 min | 128 words

I posted this code previously:

And asked what it prints. This is actually an infinite loop that will print an endless amount of zeros to the console. The question is why.

The answer is that we are running into two separate features of C# that interact with each other in a surprising way.

The issue is that we are using a nullable iterator here, and accessing the struct using the Value property. The problem is that this is a struct, and using a property will cause it to be copied.

So the way it works, the code actually runs:

And now you can more easily see the temporary copies that are created and how because we are using a value type here, we are using a different instance each time.

time to read 4 min | 750 words

I’ve been calling myself a professional software developer for just over 20 years at this point. In the past few years, I have gotten into teaching university courses in the Computer Science curriculum. I have recently had the experience of supporting a non-techie as they went through a(n intense) coding bootcamp (aiming at full stack / front end roles). I’m also building a distributed database engine and all the associated software.

I list all of those details because I want to make an observation about the distinction between fundamental and transient knowledge.

My first thought is that there is so much to learn. Comparing the structure of C# today to what it was when I learned it (pre-beta days, IIRC), it is a very different language. I had literally decades to adjust to some of those changes, but someone that is just getting started needs to grasp everything all at once. When I learned JavaScript you still had browsers in the market that didn’t recognize it, so you had to do the “//<!—” trick to get things to work (don’t ask!).

This goes far beyond mere syntax and familiarity with language constructs. The overall environment is also critically important. One of the basic tasks that I give in class is something similar to: “Write a network service that would serve as a remote dictionary for key/value operations”.  Most students have a hard time grasping details such as IP vs. host, TCP ports, how to read from the network, error handling, etc. Adding a relatively simple requirement (make it secure from eavesdroppers) will take it entirely out of their capabilities.

Even taking a “simple” problem, such as building a CRUD website is fraught with many important details that aren’t really visible. Responsive design, mobile friendly, state management and user experience, to name a few. Add requirements such as accessibility and you are setting the bar too high to reach.

I intentionally choose the examples of accessibility and security, because those are “invisible” requirements. It is easy to miss them if you don’t know that they should be there.

My first website was a PHP page that I pushed to the server using FTP and updated live in “production”. I was exposed to all the details about DNS and IPs, understood exactly that the server side was just a machine in a closet, and had very low levels of abstractions. (Naturally, the solution had no security or any other –ities). However, that knowledge from those early experiments has served me very well for decades. Same for details such as how TCP works or the basics of operating system design.

Good familiarity with the basic data structures (heap, stack, tree, list, set, map, queue) paid itself many times over. The amount of time that I spent learning WinForms… still usable and widely applicable even in other platforms and environments. WPF or jQuery? Not so much.

Learning patterns paid many dividends and was applicable on a wide range of applications and topics.

I looked into the topics that are being taught (both for bootcamps and universities) and I understand why in many cases, those are being skipped. You can actually be a front end developer without understanding much (if at all) about networks. And the breadth of details you need to know is immense.

My own tendency is to look at the low level stuff, and given that I work on a database engine, that is obviously quite useful. What I have found, however, is that whenever I dug deep into a topic, I found ways to utilize that knowledge at a later point in time. Sometimes I was able to solve a problem in a way that would be utterly inconceivable to me previously. I’m not just talking about being able to immediately apply new knowledge to a problem. If that were the case, I would attribute that to wanting to use the new thing I just learned.

However, I’m talking about scenarios where months or years later I ran into a problem, and was then able to find the right solution given what was then totally useless knowledge.

In short, I understand that chasing the 0.23-alpha-stage-2.3.1-dev updates on the left-pad package is important, but I found that spending time deep in the stack has a great cumulative effect.

Joel Spolsky wrote about leaky abstractions, that was 20 years ago. I remember reading that blog post and grokking that. And it is true, being able to dig one or two layers down from where you usually live has a huge amount of leverage on your capabilities.

time to read 1 min | 85 words

We have just released a new stable release of the RavenDB Python client API. This puts the Python client API for RavenDB on the same level as our other clients, including support for subscriptions, cluster wide transactions, compare exchange, conditional loading, and much more.

We also improved the ergonomics of the API and integration with the IDE.

Here is an example of writing a non-trivial query using the API, tell us what you think and what you are doing with RavenDB & Python.

time to read 3 min | 407 words

I’m not talking about this much anymore, but alongside RavenDB, my company produces a set of tools to help you work with OR/M (object relational mappers) such as NHibernate or Entity Framework as well as tracking what is going on with Cosmos DB.

The profilers are implemented as two separate components. We have the Appender, which runs inside the profiled process, and the Profiler, which is a WPF application that analyzes and shows you the results of the profiling. For the profilers, all the execution is done on the users’ machine.

We have crash reporting enabled and we are diligent in fixing any and all errors from the field. We recently ran into a whole spate of errors, looking something like this:

System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Controls.VirtualizingStackPanel.UpdateExtent(Boolean areItemChangesLocal)
   at System.Windows.Controls.VirtualizingStackPanel.ShouldItemsChangeAffectLayoutCore(Boolean areItemChangesLocal, ItemsChangedEventArgs args)
   at System.Windows.Controls.VirtualizingPanel.OnItemsChangedInternal(Object sender, ItemsChangedEventArgs args)
   at System.Windows.Controls.Panel.OnItemsChanged(Object sender, ItemsChangedEventArgs args)
   at System.Windows.Controls.ItemContainerGenerator.OnItemAdded(Object item, Int32 index, NotifyCollectionChangedEventArgs collectionChangedArgs)
   at System.Windows.Controls.ItemContainerGenerator.OnCollectionChanged(Object sender, NotifyCollectionChangedEventArgs args)
   at System.Windows.WeakEventManager.ListenerList`1.DeliverEvent(Object sender, EventArgs e, Type managerType)
   at System.Windows.WeakEventManager.DeliverEvent(Object sender, EventArgs args)
   at System.Collections.Specialized.NotifyCollectionChangedEventHandler.Invoke(Object sender, NotifyCollectionChangedEventArgs e)
   at System.Windows.Data.CollectionView.OnCollectionChanged(NotifyCollectionChangedEventArgs args)
   at System.Windows.WeakEventManager.ListenerList`1.DeliverEvent(Object sender, EventArgs e, Type managerType)
   at System.Windows.WeakEventManager.DeliverEvent(Object sender, EventArgs args)
   at System.Windows.Data.CollectionView.OnCollectionChanged(NotifyCollectionChangedEventArgs args)
   at System.Windows.Data.ListCollectionView.ProcessCollectionChangedWithAdjustedIndex(NotifyCollectionChangedEventArgs args, Int32 adjustedOldIndex, Int32 adjustedNewIndex)
   at System.Collections.Specialized.NotifyCollectionChangedEventHandler.Invoke(Object sender, NotifyCollectionChangedEventArgs e)
   at System.Collections.ObjectModel.ObservableCollection`1.OnCollectionChanged(NotifyCollectionChangedEventArgs e)
   at Caliburn.Micro.BindableCollection`1.OnCollectionChanged(NotifyCollectionChangedEventArgs e)
   at System.Collections.ObjectModel.ObservableCollection`1.InsertItem(Int32 index, T item)
   at Caliburn.Micro.BindableCollection`1.OnUIThread(Action action)
   at HibernatingRhinos.Profiler.Client.Sessions.SessionListModel.Add(SessionModel model)

And here is the relevant code:

image

This is called from a timer thread (not from the UI) one, and the Items collection in this case is a BindableCollection<T>.

The error is happening deep in the guts of WPF and it seems like it has been triggered by some recent Windows update. Here is the “fix” for this issue:

image

Basically, don’t report this error, and continue executing normally (the next UI operation will fix the UI state, usually within < 200 ms).

This is the right call in terms of development time and effort, but I got to say, this makes me feel quite uncomfortable to see a change like that.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Challenge (72):
    19 Sep 2023 - Spot the bug
  2. Filtering negative numbers, fast (4):
    15 Sep 2023 - Beating memcpy()
  3. Recording (9):
    28 Aug 2023 - RavenDB and High Performance with Oren Eini
  4. Production postmortem (50):
    24 Jul 2023 - The dog ate my request
  5. Podcast (4):
    21 Jul 2023 - Hansleminutes - All the Performance with RavenDB's Oren Eini
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}