architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (643) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1089) rss
raven (1457) rss
ravendb.net (542) rss
reviews (184) rss

2025
- August (1)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Mar 13 2025

The Heisenberg uncertainty principle for management's opinion

time to read 4 min | 796 words

Tags:

A good lesson I learned about being a manager is that the bigger the organization, the more important it is for me to be silent. If we are discussing a set of options, I have to talk last, and usually, I have to make myself wait until the end of a discussion before I can weigh in on any issues I have with the proposed solutions.

Speaking last isn’t something I do to have the final word or as a power play, mind you. I do it so my input won’t “taint” the discussion. The bigger the organization, the more pressure there is to align with management. If I want to get unbiased opinions and proper input, I have to wait for it. That took a while to learn because the gradual growth of the company meant that the tipping point basically snuck up on me.

One day, I was working closely with a small team. They would argue freely and push back if they thought I was wrong without hesitation. The next day, the company grew to the point where I would only rarely talk to some people, and when I did, it was the CEO talking, not me.

It’s a subtle shift, but once you see it, you can’t unsee it. I keep thinking if I need to literally get a couple of hats and walk around in the office wearing different hats at different times.

To deal with this issue, I went out of my way to get a few “no-men” (the opposite of yes-men), who can reliably tell me when what I’m proposing is… let’s call it an idealistic view of reality. These are the folks who’ll look at my grand plan to, say, overhaul our entire CRM in a week and say, “Hey, love the enthusiasm, but have you considered the part where we all spontaneously combust from stress?” There may have been some pointing at grey hair and receding hairlines as well.

The key here is that I got these people specifically because I value their opinions, even when I disagree with them. It’s like having a built-in reality check—annoying in the moment, but worth its weight in gold when it keeps you from driving the whole team off a cliff.

This ties into one of the trickier parts of managerial duties: knowing when to steer and when to step back. Early on, I thought being a manager was about having all the answers and making sure everyone knew it. But the reality? It’s more like being a gardener—you plant the seeds (the vision), water them (with resources and support), and then let the team grow into it.

My job isn’t to micromanage every leaf; it’s to make sure the conditions are right for the whole thing to thrive. That means trusting people to do their jobs, even if they don’t do it exactly how I would.

Of course, there’s another side to this gig: the ability to move the goalposts that measure what’s required. Changing the scope of a problem is a really good way to make something that used to be impossible a reality. I’m reminded of this XKCD comic—you know the one, where if you change the problem just enough to turn a “no way” into a “huh, that could work”? That’s a manager’s superpower.

You’re not just solving problems; you’re redefining them so the team can win. Maybe the deadline’s brutal, but if you shift the focus from “everything” to “we don’t need this feature for launch,” suddenly everyone’s breathing again.

It is a very strange feeling because you move from doing things yourself, to working with a team, to working at a distance of once or twice removed. On the one hand, you can get a lot more done, but on the other hand, it can be really frustrating when it isn’t done the way (and with the speed) that I could do it.

This isn’t a motivational post, it is not a fun aspect of my work. I only have so many hours in the day, and being careful about where I put my time is important. At the same time, it means that I have to take into account that what I say matters, and if I say something first, it puts a pretty big hurdle in front of other people if they disagree with me.

In other words, I know it can come off as annoying, but not giving my opinion on something is actually a well-thought-out strategy to get the raw information without influencing the output. When I have all the data, I can give my own two cents on the matter safely.

Mar 11 2025

On the role of design documents

time to read 8 min | 1476 words

Tweet Share Share 1 comments

Tags:

When we build a new feature in RavenDB, we either have at least some idea about what we want to build or we are doing something that is pure speculation. In either case, we will usually spend only a short amount of time trying to plan ahead.

A good example of that can be found in my RavenDB 7.1 I/O posts, which cover about 6+ months of work for a major overhaul of the system. That was done mostly as a series of discussions between team members, guidance from the profiler, and our experience, seeing where the path would lead us. In that case, it led us to a five-fold performance improvement (and we’ll do better still by the time we are done there).

That particular set of changes is one of the more complex and hard-to-execute changes we have made in RavenDB over the past 5 years or so. It touched a lot of code, it changed a lot of stuff, and it was done without any real upfront design. There wasn’t much point in designing, we knew what we wanted to do (get things faster), and the way forward was to remove obstacles until we were fast enough or ran out of time.

I re-read the last couple of paragraphs, and it may look like cowboy coding, but that is very much not the case. There is a process there, it is just not something we would find valuable to put down as a formal design document. The key here is that we have both a good understanding of what we are doing and what needs to be done.

RavenDB 4.0 design document

The design document we created for RavenDB 4.0 is probably the most important one in the project’s history. I just went through it again, it is over 20 pages of notes and details that discuss the current state of RavenDB at the time (written in 2015) and ideas about how to move forward.

It is interesting because I remember writing this document. And then we set out to actually make it happen, that wasn’t a minor update. It took close to three years to complete the process, to give you some context about the complexity and scale of the task.

To give some further context, here is an image from that document:

And here is the sharding feature in RavenDB right now:

This feature is called prefixed sharding in our documentation. It is the direct descendant of the image from the original 4.0 design document. We shipped that feature sometime last year. So we are talking about 10 years from “design” to implementation.

I’m using “design” in quotes here because when I go through this v4.0 design document, I can tell you that pretty much nothing that ended up in that document was implemented as envisioned. In fact, most of the things there were abandoned because we found much better ways to do the same thing, or we narrowed the scope so we could actually ship on time.

Comparing the design document to what RavenDB 4.0 ended up being is really interesting, but it is very notable that there isn’t much similarity between the two. And yet that design document was a fundamental part of the process of moving to v4.0.

What Are Design Documents?

A classic design document details the architecture, workflows, and technical approach for a software project before any code is written. It is the roadmap that guides the development process.

For RavenDB, we use them as both a sounding board and a way to lay the foundation for our understanding of the actual task we are trying to accomplish. The idea is not so much to build the design for a particular feature, but to have a good understanding of the problem space and map out various things that could work.

Recent design documents in RavenDB

I’m writing this post because I found myself writing multiple design documents in the past 6 months. More than I have written in years. Now that RavenDB 7.0 is out, most of those are already implemented and available to you. That gives me the chance to compare the design process and the implementation with recent work.

Vector Search & AI Integration for RavenDB

This was written in November 2024. It outlines what we want to achieve at a very high level. Most importantly, it starts by discussing what we won’t be trying to do, rather than what we will. Limiting the scope of the problem can be a huge force multiplier in such cases, especially when dealing with new concepts.

Reading throughout that document, it lays out the external-facing aspect of vector search in RavenDB. You have the vector.search() method in RQL, a discussion on how it works in other systems, and some ideas about vector generation and usage.

It doesn’t cover implementation details or how it will look from the perspective of RavenDB. This is at the level of the API consumer, what we want to achieve, not how we’ll achieve it.

AI Integration with RavenDB

Given that we have vector search, the next step is how to actually get and use it. This design document was a collaborative process, mostly written during and shortly after a big design discussion we had (which lasted for hours).

The idea there was to iron out the overall understanding of everyone about what we want to achieve. We considered things like caching and how it plays into the overall system, there are notes there at the level of what should be the field names.

That work has already been implemented. You can access it through the new AI button in the Studio. Check out this icon on the sidebar:

That was a much smaller task in scope, but you can see how even something that seemed pretty clear changed as we sat down and actually built it. Concepts we didn’t even think to consider were raised, handled, and implemented (without needing another design).

Voron HSNW Design Notes

This design document details our initial approach to building the HSNW implementation inside Voron, the basis for RavenDB’s new vector search capabilities.

That one is really interesting because it is a pure algorithmic implementation, completely internal to our usage (so no external API is needed), and I wrote it after extensive research.

The end result is similar to what I planned, but there are still significant changes. In fact, pretty much all the actual implementation details are different from the design document. That is both expected and a good thing because it means that once we dove in, we were able to do things in a better way.

Interestingly, this is often the result of other constraints forcing you to do things differently. And then everything rolls down from there.

“If you have a problem, you have a problem. If you have two problems, you have a path for a solution.”

In the case of HSNW, a really complex part of the algorithm is handling deletions. In our implementation, there is a vector, and it has an associated posting list attached to it with all the index entries. That means we can implement deletion simply by emptying the associated posting list. An entire section in the design document (and hours spent pondering) is gone, just like that.

If the design document doesn’t reflect the end result of the system, are they useful?

I would unequivocally state that they are tremendously useful. In fact, they are crucial for us to be able to tackle complex problems. The most important aspect of design documents is that they capture our view of what the problem space is.

Beyond their role in planning, design documents serve another critical purpose: they act as a historical record. They capture the team’s thought process, documenting why certain decisions were made and how challenges were addressed. This is especially valuable for a long-lived project like RavenDB, where future developers may need context to understand the system’s evolution.

Imagine a design document that explores a feature in detail—outlining options, discussing trade-offs, and addressing edge cases like caching or system integrations. The end result may be different, but the design document, the feature documentation (both public and internal), and the issue & commit logs serve to capture the entire process very well.

Sometimes, looking at the road not taken can give you a lot more information than looking at what you did.

I consider design documents to be a very important part of the way we design our software. At the same time, I don’t find them binding, we’ll write the software and see where it leads us in the end.

What are your expectations and experience with writing design documents? I would love to hear additional feedback.

Mar 06 2024

RavenDB and Two Factor Authentication

time to read 2 min | 270 words

Tweet Share Share 0 comments

Tags:

RavenDB is typically accessed directly by your application, using an X509 certificate for authentication. The same applies when you are connecting to RavenDB as a user.

Many organizations require that user authentication will not use just a single factor (such as a password or a certificate) but multiple. RavenDB now supports the ability to define Two Factor Authentication for access.

Here is how this looks like in the RavenDB Studio:

You are able to generate a certificate as well as register the Authenticator code in your device.

When using the associated certificate, you’ll not be able to access RavenDB. Instead, you’ll get an error message saying that you need to complete the Two Factor Authentication process. Here is what that looks like:

Once you complete the two factor authentication process, you can select for how long we’ll allow access with the given certificate and whatever to allow just accesses from the current browser window (because you are accessing it directly) or from any client (you want to access RavenDB from another device or via code).

Once the session duration expires, you’ll need to provide the authentication code again, of course.

This feature is meant specifically for certificates that are used by people directly. It is not meant for APIs or programmatic access. Those should either have a manual step to allow the certificate or utilize a secrets manager that can have additional steps and validations based on your actual requirements.

You can read more about this feature in the feature announcement.

Mar 04 2024

RavenDB Cloud Global Status vs. Product Status

time to read 2 min | 399 words

Tweet Share Share 2 comments

Tags:

One of the interesting components of RavenDB Cloud is status reporting. It turns out that when you offer X as a Service, people really care about your operational status.

For RavenDB Cloud, we have https://status.ravendb.net/, which will give you some insights into the overall health of the system. Here are some details from the status page:

The interesting thing about this page is that it shows global status, indicating issues affecting large swaths of users. For instance, Azure having issues in a whole region in the image above is a great example of one such scenario. Regular maintenance, which we carry over the span of days, is something that we report, but you’ll usually never notice (due to the High Availability features of RavenDB).

It gets more complicated when we start talking about individual instances. There are many scenarios where the overall system health is great, but a particular database may suffer. The easiest example is if you run out of disk space. That affects that particular instance only.

For that scenario, we are reporting Production Monitoring Alerts within the RavenDB Cloud portal. Here is what this looks like:

As you can see, we report specific problems on those instances, raising that to your awareness. That was actually needed because, for the most part, RavenDB itself handles those sorts of things via High Availability, which means that even if there are issues, you’re likely to not feel them for a while.

Resilience at the cluster level means that even pretty severe problems are papered over and the system moves on. But there is only so much limping that you can do. If you are running at the bare edge of capacity, eventually you’ll trip over the line.

Those Production Monitoring Alerts allow you to detect and act upon those issues when they happen, not when they bring down production.

This aligns with our vision for RavenDB, the kind of system where you don’t need to have a full-time babysitter monitoring the system. Instead, if there is a problem that the database cannot solve on its own, it will explicitly notify you, in advance.

That leads to a system that is far healthier all around and means that you can focus on building your system, rather than managing database minutiae.

Jan 19 2024

Code review & Time Travel

time to read 4 min | 672 words

Tweet Share Share 2 comments

Tags:

A not insignificant part of my job is to go over code. Today I want to discuss how we approach code reviews at RavenDB, not from a process perspective but from an operational one. I have been a developer for nearly 25 years now, and I’ve come to realize that when I’m doing a code review I’m actually looking at the code from three separate perspectives.

The first, and most obvious one, is when I’m actually looking for problems in the code - ensuring that I can understand what is going on, confirming the flow makes sense, etc. This involves looking at the code as it is right now.

I’m going to be showing snippets of code reviews here. You are not actually expected to follow the code, only the concepts that we talk about here.

Here is a classic code review comment:

There is some duplicated code that we need to manage. Another comment that I liked is this one, pointing out a potential optimization in the code:

If we define the code using the static keyword, we’ll avoid delegate allocation and save some memory, yay!

It gets more interesting when the code is correct and proper, but may do something weird in some cases, such as in this one:

I really love it when I run into those because they allow me to actually explore the problem thoroughly. Here is an even better example, this isn’t about a problem in the code, but a discussion on its impact.

RavenDB has been around for over 15 years, and being able to go back and look at those conversations in a decade or so is invaluable to understanding what is going on. It also ensures that we can share current knowledge a lot more easily.

Speaking of long running-projects, take a look at the following comment:

Here we need to provide some context to explain. The _caseInsensitive variable here is a concurrent dictionary, and the change is a pretty simple optimization to avoid the annoying KeyValuePair overload. Except… this code is there intentionally, we use it to ensure that the removal operation will only succeed if both the key and the value match. There was an old bug that happened when we removed blindly and the end result was that an updated value was removed.

In this case, we look at the code change from a historical perspective and realize that a modification would reintroduce old (bad) behavior. We added a comment to explain that in detail in the code (and there already was a test to catch it if this happens again).

By far, the most important and critical part of doing code reviews, in my opinion, is not focusing on what is or what was, but on what will be. In other words, when I’m looking at a piece of code, I’m considering not only what it is doing right now, but also what we’ll be doing with it in the future.

Here is a simple example of what I mean, showing a change to a perfectly fine piece of code:

The problem is that the if statement will call InitializeCmd(), but we previously called it using a different condition. We are essentially testing for the same thing using two different methods, and while currently we end up with the same situation, in the future we need to be aware that this may change.

I believe one of the major shifts in my thinking about code reviews came about because I mostly work on RavenDB, and we have kept the project running over a long period of time. Focusing on making sure that we have a sustainable and maintainable code base over the long haul is important. Especially because you need to experience those benefits over time to really appreciate looking at codebase changes from a historical perspective.

Jan 17 2024

Meta BlogBlogging ergonomics in 2024

time to read 20 min | 3954 words

Tweet Share Share 7 comments

Tags:

I've been writing this blog since 2004. That means I have been doing this for twenty years, which is frankly unbelievable to me. The actual date is sometime in April, so I’ll probably do a summary post then about that.

What I want to talk about today is a different aspect. The mechanism and processes I use to write blog posts. A large part of the reason I write blog posts is that it helps me understand and organize my own thoughts. And in order to do that effectively, I have found that I need very little friction in the blogging process.

About a decade ago, Google Reader was shut down, and I’m still very bitter about that. It effectively killed a significant portion of the blogging audience and made the ergonomics of reading blogs a lot harder. That also led people to use walled gardens to communicate with others, instead of the decentralized network and feed aggregators. A side effect of that decision is that blogging tools have stopped being a viable thing people spend time or money on.

At the time, I was using Windows Live Writer, which was a high-quality editor and had a rich plugin system. Microsoft discontinued it at some point, it became an open-source project, and even that died. The website is no longer functional and even in terms of the GitHub project, the last commit was 5 years ago.

I’m still using Open Live Writer to write the majority of my blog posts, but given there are no longer any plugins, even something as simple as embedding code in my posts has become an… annoyance. That kills the ergonomics of blogging for me.

Not a problem, this is Open Source, and I can do that myself. Except… I really don’t have the time to spend on something ancillary like that. I would happily pay (a reasonable amount) for a blogging client, but I’m going to assume that I’m not part of a large enough group that there is a market for this.

Taking the code snippets example, I can go into the code, figure out what is going on there, and add a “code snippet” feature. I estimate that would take several days. Alternatively, I can place the code as a GitHub gist and embed it in the page. It is annoying, but far quicker than going to the trouble of figuring that out.

Another issue that bugs me (pun intended) is a problem with copy/paste of images, where taking screenshots using the Snipping Tool doesn’t paste into Writer. I need to first paste them into Paint, then into Writer. In this case, I assume that Writer doesn’t recognize the clipboard format or something similar.

Finally, it turns out that I’m not writing blog posts in the same manner as I used to. It got to the point where I asked people to review my posts before making them public. It turns out that no matter how many times it is corrected, my brain seems unable to discern when to write “whether” or “whatever”, for example. At this point I gave up updating that piece of software 🙂. Even the use of emojis doesn’t work properly (Open Live Writer mostly predates a lot of them and breaks the HTML in a weird fashion 🤷).

In other words, there are several problems in my current workflow, and it has finally reached the point where I need to do something about it. The last requirement, by the way, is the most onerous. Consider the workflow of getting the following fixes to a blog post:

and we run => and we ran
we spend => we spent

Where is my collaborating editing and the ability to suggest changes with good UX? Improving the ergonomics for the blog has just expanded in scope massively. Now it is a full-fledged publishing platform with modern sensibilities. It’s 2024, features like proper spelling and grammar corrections should absolutely be there, no? And what about AI integration? It turns out that predicting text makes the writing process more efficient. Here is what this may look like:

At this stage, this isn’t just a few minor fixes. I should mention that for the past decade and a half or so, I stopped considering myself as someone who can do UI in any meaningful manner. I find that the <table/> tag, which used to be my old reliable method, is not recommended now, for some reason.

This… kind of sucks. I want to upgrade my process by a couple of decades, but I don’t want to pay the price for that. If only there was an easier way to do that.

I started using Google Docs to edit my blog posts, then pasting them into Live Writer or directly to the blog (using a Rich Text Box with an editor from… a decade ago). I had to check the source code for this, by the way. The entire experience is decidedly Developer UX. Then I had a thought, I already have a pretty good process of writing the blog posts in Google Docs, right? It handles rich text editing and management much better than the editor in the blog. There are also options for things like proper workflows. For example, someone can go over my drafts and make comments or suggestions.

The only thing that I need is to put both of those together. I have to admit that I spent quite some time just trying to figure out how to get the document from Google Docs using code. The authentication hurdles are… significant to someone who isn’t aware of how it all plugs together. Once I got that done, I got my publishing platform with modern features. Here is what the end result looks like:

public class PublishingPlatform
{
  private readonly DocsService GoogleDocs;
  private readonly DriveService GoogleDrive;
  private readonly Client _blogClient;


  public PublishingPlatform(string googleConfigPath, string blogUser, string blogPassword)
  {
    var blogInfo = new MetaWeblogClient.BlogConnectionInfo(
     "https://ayende.com/blog",
     "https://ayende.com/blog/Services/MetaWeblogAPI.ashx",
     "ayende.com", blogUser, blogPassword);
    _blogClient = new MetaWeblogClient.Client(blogInfo);


    var initializer = new BaseClientService.Initializer
    {
      HttpClientInitializer = GoogleWebAuthorizationBroker.AuthorizeAsync(
          GoogleClientSecrets.FromFile(googleConfigPath).Secrets,
          new[] { DocsService.Scope.Documents, DriveService.Scope.DriveReadonly },
          "user", CancellationToken.None,
          new FileDataStore("blog.ayende.com")
      ).Result
    };


    GoogleDocs = new DocsService(initializer);
    GoogleDrive = new DriveService(initializer);
  }


  public void Publish(string documentId)
  {
    using var file = GoogleDrive.Files.Export(documentId, "application/zip").ExecuteAsStream();
    var zip = new ZipArchive(file, ZipArchiveMode.Read);


    var doc = GoogleDocs.Documents.Get(documentId).Execute();
    var title = doc.Title;


    var htmlFile = zip.Entries.First(e => Path.GetExtension(e.Name).ToLower() == ".html");
    using var stream = htmlFile.Open();
    var htmlDoc = new HtmlDocument();
    htmlDoc.Load(stream);
    var body = htmlDoc.DocumentNode.SelectSingleNode("//body");


    var (postId, tags) = ReadPostIdAndTags(body);


    UpdateLinks(body);
    StripCodeHeader(body);
    UploadImages(zip, body, GenerateSlug(title));


    string post = GetPostContents(htmlDoc, body);


    if (postId != null)
    {
      _blogClient.EditPost(postId, title, post, tags, true);
      return;
    }


    postId = _blogClient.NewPost(title, post, tags, true, null);


    var update = new BatchUpdateDocumentRequest();
    update.Requests = [new Request
    {
      InsertText = new InsertTextRequest
      {
        Text = $"PostId: {postId}\r\n",
        Location = new Location
        {
          Index = 1,
        }
      },
    }];


    GoogleDocs.Documents.BatchUpdate(update, documentId).Execute();
  }


  private void StripCodeHeader(HtmlNode body)
  {
    foreach(var remove in body.SelectNodes("//span[text()='&#60419;']").ToArray())
    {
      remove.Remove();
    }
    foreach (var remove in body.SelectNodes("//span[text()='&#60418;']").ToArray())
    {
      remove.Remove();
    }
  }


  private static string GetPostContents(HtmlDocument htmlDoc, HtmlNode body)
  {
    // we use the @scope element to ensure that the document style doesn't "leak" outside
    var style = htmlDoc.DocumentNode.SelectSingleNode("//head/style[@type='text/css']").InnerText;
    var post = "<style>@scope {" + style + "}</style> " + body.InnerHtml;
    return post;
  }


  private static void UpdateLinks(HtmlNode body)
  {
    // Google Docs put a redirect like: https://www.google.com/url?q=ACTUAL_URL
    foreach (var link in body.SelectNodes("//a[@href]").ToArray())
    {
      var href = new Uri(link.Attributes["href"].Value);
      var url = HttpUtility.ParseQueryString(href.Query)["q"];
      if (url != null)
      {
        link.Attributes["href"].Value = url;
      }
    }
  }


  private static (string? postId, List<string> tags) ReadPostIdAndTags(HtmlNode body)
  {
    string? postId = null;
    var tags = new List<string>();
    foreach (var span in body.SelectNodes("//span"))
    {
      var text = span.InnerText.Trim();
      const string TagsPrefix = "Tags:";
      const string PostIdPrefix = "PostId:";
      if (text.StartsWith(TagsPrefix, StringComparison.OrdinalIgnoreCase))
      {
        tags.AddRange(text.Substring(TagsPrefix.Length).Split(","));
        RemoveElement(span);
      }
      else if (text.StartsWith(PostIdPrefix, StringComparison.OrdinalIgnoreCase))
      {
        postId = text.Substring(PostIdPrefix.Length).Trim();
        RemoveElement(span);
      }
    }
    // after we removed post id & tags, trim the empty lines
    while (body.FirstChild.InnerText.Trim() is "&nbsp;" or "")
    {
      body.RemoveChild(body.FirstChild);
    }
    return (postId, tags);
  }


  private static void RemoveElement(HtmlNode element)
  {
    do
    {
      var parent = element.ParentNode;
      parent.RemoveChild(element);
      element = parent;
    } while (element?.ChildNodes?.Count == 0);
  }


  private void UploadImages(ZipArchive zip, HtmlNode body, string slug)
  {
    var mapping = new Dictionary<string, string>();
    foreach (var image in zip.Entries.Where(x => Path.GetDirectoryName(x.FullName) == "images"))
    {
      var type = Path.GetExtension(image.Name).ToLower() switch
      {
        ".png" => "image/png",
        ".jpg" or "jpeg" => "image/jpg",
        _ => "application/octet-stream"
      };
      using var contents = image.Open();
      var ms = new MemoryStream();
      contents.CopyTo(ms);
      var bytes = ms.ToArray();
      var result = _blogClient.NewMediaObject(slug + "/" + Path.GetFileName(image.Name), type, bytes);
      mapping[image.FullName] = new UriBuilder { Path = result.URL }.Uri.AbsolutePath;
    }
    foreach (var img in body.SelectNodes("//img[@src]").ToArray())
    {
      if (mapping.TryGetValue(img.Attributes["src"].Value, out var path))
      {
        img.Attributes["src"].Value = path;
      }
    }
  }


  private static string GenerateSlug(string title)
  {
    var slug = title.Replace(" ", "");
    foreach (var ch in Path.GetInvalidFileNameChars())
    {
      slug = slug.Replace(ch, '-');
    }


    return slug;
  }
}

You’ll probably not appreciate this, but the fact that I can just push code like that into the document and get it with proper formatting easily is a major lifestyle improvement from my point of view.

The code works with the document in two ways. First, in the Document DOM (which is quite complex), it extracts the title of the blog post and afterward updates it with the document ID. But the core of this code is to extract the document as a zip file, grab everything from there, and push that to the blog. I do some editing for the HTML to get everything set up properly, mostly editing the links and uploading the images. There is also some stuff happening with CSS scopes that I frankly don’t understand. I think I got it right, which is fine for now.

This cost me a couple of evenings, and it was fun. Nothing earth-shattering, I’ll admit. But it’s the first time in a while that I actually wrote a piece of code that was immediately useful. My blogging queue is rather full, and I hope that with this new process it will be easier to push the ideas out of my head and to the blog.

And with that, it is now 01:26 AM, and I’m going to call it a night 🙂.

And as a final thought, I had just made several changes to the post after publication, and it went smoothly. I think that I like it.

Dec 31 2023

I was working on the 2024 budget numbers, and…

time to read 1 min | 82 words

Tweet Share Share 2 comments

Tags:

Today I had a meeting to go over the 2024 budget, and we ran into one of the most important line times. Our coffee budget.

You know the old adage about: Coders are turning Coffee into Code, right?

Certainly true in our case, in 2023 we spent a large 5-figure sum on coffee alone. And 2024 is shaping up to be even more expensive.

Happy new year!

Dec 22 2023

Learning over the holidays: Yet Another Bug Tracker sample app

time to read 2 min | 235 words

Tweet Share Share 5 comments

Tags:

If you are reading this blog, I assume that you are a like-minded person. My idea of relaxation is to sit and write code. Hopefully on something that I’m not familiar with. I have many such blog post series covering topics I care about. It’s my idea of meditation.

For the end of 2023, I thought that we could do something similar but on a broader scale. A while ago Alex Klaus wrote a walkthrough on how to build a complete application from scratch using modern best practices (and RavenDB). We refreshed the code and made it widely available, offering you something fun , educational, and productive to engage with.

The system is a bug tracker (allowing us to focus on the architecture rather than domain concerns), and you can play with a deployed version live. The code is available under the MIT license, and we’ll be very happy to receive any suggested improvements.

Topics that are covered:

As usual, I would love any feedback you have to offer.

Aug 14 2023

The role of GitHub in paying for Open Source Software

time to read 7 min | 1203 words

Tweet Share Share 2 comments

Tags:

I have been doing Open Source work for just under twenty years at this point. I have been paying my mortgage from Open Source software for about 15. I’m stating that to explain that I have spent quite a lot of time struggling with the inherent tension between having an Open Source project and getting paid.

I wrote about it a few times in the past. It is not a trivial problem, and the core of the issue is not something that you can easily solve with technical means. I ran into this fascinating thread on Twitter that over the weekend:

you just described licensing. As you missed 1 important aspect: if an org isn't obligated to pay, they won't. So you need a form of making them pay by giving them a token they paid which then makes them able to use your software. Any other form will fail.
— Frans Bouma (@FransBouma) August 11, 2023

And another part of that is here:

The thing is, businesses spend significant amounts of money on software licenses, whether on-prem or as-a-service.

They understand and accept this, as do their shareholders and investors. It is a cost of doing business.

Donations? Not so much.
— Udi Dahan (@UdiDahan) August 11, 2023

I’m quoting the most relevant pieces, but the idea is pretty simple.

Donations don’t work, period. They don’t work not because companies are evil or developers don’t want to pay for Open Source. They don’t work because it takes a huge amount of effort to actually get paid.

If you are an independent developer, your purchasing process goes something like this:

I would like to use this thing
I need to pay for that
The price matches the value I’m getting
Where is my credit card…
Paid!

Did you note step 2? The part about needing to pay?

If you don’t have that step, what will happen? Same scenario, an independent developer:

I would like to use this thing
I use this thing
It would be great to pay something to show my appreciation
Where did I put the credit card? Oh, it’s down the hall… I’ll get to that later (never).

That is in the best-case scenario where the thought of donating actually crossed your mind. In most likelihood, the process is more:

I would like to use this thing
I use this thing
Ticket closed, what is the next one… ?

Now, what happens if you are not an independent developer? Let’s say that you are a contract worker for a company. You need to talk to your contact person, they will need to get purchasing approval. Depending on the amount, that may require escalating upward a few levels, etc.

Let’s say that the amount is under 100$, so basically within the budgetary discretion of the first manager you run into. They would still need to know what they are paying for, what they are getting out of that (they need to justify that). If this is a donation, welcome to the beauty of tax codes in multiple jurisdictions and what counts as such. If this is not a donation, what do they get? That means that you now have to do a meeting, potentially multiple ones. Present your case, open a new supplier at the company, etc.

The cost of all of those is high, both in time and money. Or… you can just nuget add-package and move on.

In the case of RavenDB, it is an Open Source software (a license to match, code is freely available), but we treat it as a commercial project for all intents and purposes. If you want to install RavenDB, you’ll get a popup saying you need a license, directing you to a page where you see how much we would like to get and what do you get in return, etc. That means that from a commercial perspective, we are in a familiar ground for companies. They are used to paying for software, and there isn’t an option to just move on to the next task.

There is another really important consideration here. In the ideal Open Source donation model, money just shows up in your account. In the commercial world, there is a huge amount of work that is required to get things done. That is when you have a model where “the software does not work without a purchase”. To give some context, 22% is Sales & Marketing and they spent around 21.8 billion in 2022 on Sales & Marketing. That is literally billions being spent to make sales.

If you want to make money, you are going to invest in sales, sales strategy, etc. I’m ignoring marketing here because if you are expected to make money from Open Source, you likely already have a project well-known enough to at least get started.

That means that you need to figure out what you are charging for, how do you get customers, etc. In the case of RavenDB, we use the per-core model, which is a good indication of how much use the user is getting from RavenDB. LLBLGen Pro, on the other hand, they are charging per seat. Particular’s NServiceBus uses a per endpoint / number of messages a day model.

There is no one model that fits all. And you need to be able to tailor your pricing model to how your users think about your software.

So pricing strategy, creating a proper incentive to purchase (hard limit, usually) and some sales organization to actually drive all of that are absolutely required.

Notice what is missing here? GitHub. It simply has no role at all up to this point. So why the title of this post?

There is one really big problem with getting paid that GitHub can solve for Open Source (and in general, I guess).

The whole process of actually getting paid is absolutely atrocious. In the best case, you need to create a supplier at the customer, fill up various forms (no, we don’t use child labor or slaves, indeed), figure out all sorts of weird roles (German tax authority requires special dispensation, and let’s not talk about getting paid from India, etc). Welcome to Anti Money Laundering roles and GDPR compliance with Known Your Customer and SOC 2 regulations. The last sentence is basically nonsense words, but I understand that if you chant it long enough, you get money in the end.

What GitHub can do is be a payment pipe. Since presumably your organization is already set up with them in place, you can get them to do the invoicing, collecting the payment, etc. And in the end, you get the money.

That sounds exactly like GitHub Sponsorships, right? Except that in this case, this is no a donation. This is a flat-out simple transaction, with GitHub as the medium. The idea is that you have a limit, which you enforce, on your usage, and GitHub is how you are paid. The ability to do it in this fashion may make things easier, but I would assume that there are about three books worth of regulations and EULAs to go through to make it actually successful.

Yet, as far as I’m concerned, that is really the only important role that we have for GitHub here.

That is not a small thing, mind. But it isn’t a magic bullet.

Apr 19 2023

Deploying RavenDB with Helm Chart

time to read 1 min | 57 words

Tweet Share Share 4 comments

Tags:

Helm is the package manager for Kubernetes. It allows you to easily deploy applications and systems to a Kubernetes cluster easily, safely and in a reproducible manner.

We provide you with a chart so you can use Helm to deploy RavenDB clusters.

You can visit this link for a full discussion on how to do so.

Oren Eini

Oren Eini

CEO of RavenDB

The Heisenberg uncertainty principle for management's opinion

On the role of design documents

RavenDB 4.0 design document

What Are Design Documents?

Recent design documents in RavenDB

Vector Search & AI Integration for RavenDB

AI Integration with RavenDB

Voron HSNW Design Notes

If the design document doesn’t reflect the end result of the system, are they useful?

RavenDB and Two Factor Authentication

RavenDB Cloud Global Status vs. Product Status

Code review & Time Travel

Meta BlogBlogging ergonomics in 2024

I was working on the 2024 budget numbers, and…

Learning over the holidays: Yet Another Bug Tracker sample app

The role of GitHub in paying for Open Source Software

Deploying RavenDB with Helm Chart

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed