architecture (628) rss
bugs (451) rss
challenges (137) rss
community (391) rss
databases (482) rss
design (905) rss
development (673) rss
hibernating-practices (75) rss
miscellaneous (593) rss
performance (398) rss
programming (1125) rss
raven (1490) rss
ravendb.net (580) rss
reviews (184) rss

2026
- May (2)
- April (5)
- February (4)
- January (5)
2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

May 05 2026

Learning to code, 1990s vs 2026

time to read 7 min | 1339 words

Tags:

I still remember the bookstore. I was holding a 600-page brick of a book on how to build Windows applications, trying to convince my mother that I really needed it. This was 1994 or 1995. A book was how you learned to program at that time. You took it home, you read it cover to cover, you typed the examples by hand, and somewhere along the way, the ideas sank in.

From there, the tools for learning kept evolving. Printed books gave way to CD-ROMs and then to online documentation. Then came the explosion of blogs and RSS feeds. I started this blog at that time, and I still consider that era to be one of the best ones in terms of having amazing access to smart and knowledgeable people, freely sharing their insights and experiences.

Google killed Google Reader (yes, I am still angry about that) and a lot of the new people learned via Stack Overflow. The world entered a strange equilibrium that lasted, honestly, more than a decade. If you learned to code any time between roughly 2010 and 2022, you probably learned through some combination of Google, Stack Overflow, and maybe YouTube.

Then the floor moved again. First it was ChatGPT, where you copy-pasted code back and forth. Then the models were integrated into the IDE. Now, with Claude Code and Codex, it is something else entirely: an agent that just runs, makes decisions, and does the thing.

The arc is striking when you lay it out. You used to have to go to a physical library, pick up a physical book, read it, digest it, and think about it. Today, the prevailing message to a new developer is essentially: you do not need to know any of that. Just describe what you want, and it happens.

Hidden costs for reduced conceptual depth

This shift is not just about convenience. It changes the depth of knowledge a developer carries, and that has consequences. Here is the example I keep coming back to. Imagine you ask a developer to show you a website that they built.

If you asked that in the late nineties, it meant something. To do that, you had to purchase a domain. Understand DNS well enough to wire it up correctly. Set up a web server, which meant getting Apache to actually run. Successfully configure PHP and deploy scripts to production.

By the time you could point to a working URL, you had to touch every layer of the stack. There was no other choice. Therefore, you were at least passingly familiar with a lot more than you would be today.

Ask that same question of many developers today, and the answer is a Vercel subdomain. That is not a dig at Vercel, mind you - it is a great product, and abstraction is the whole point. But some of these developers genuinely do not know what DNS is. They do not know what is running on the server versus the client. They do not know that there is even a meaningful distinction. And we have seen real security incidents come out of exactly that gap — secrets leaking into client bundles, auth logic running where it should not, and CORS misconfigurations that nobody understood well enough to notice.

Now extend that same dynamic one more step. Take the cohort of developers who will learn to program primarily through this new generation of agentic tools. The abstraction is no longer just over DNS or deployment. It is over the act of writing the code itself.

What is the role of a junior developer now?

I think we are going to end up with a genuinely different type of engineer and, as a result, a genuinely different type of system.

“If men learn this, it will implant forgetfulness in their souls; they will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks”.

Plato, Phaedrus (c. 429-347 BCE)

Every generation has been accused of being softer than the previous generation, as the quote above can testify. In this case, Plato is decrying writing as a corrupting influence on youth who no longer bother to just remember things.

Without the attribution, I don’t think you would have realized that this isn’t me talking about developers utilizing coding agents instead of learning on their own.

In software, we see much the same pattern. The person who wrote assembly looked down on the C programmer. The C programmer looked down on the Java programmer. The Java programmer looked down on the person gluing libraries together in Python. Each step up the abstraction ladder lets people build bigger, more ambitious things with less effort. That is mostly good.

But there is a real asymmetry this time. The earlier steps abstracted away mechanical work — memory management, boilerplate, deployment plumbing. This step abstracts away the reasoning itself. And reasoning is what you need when the abstraction leaks, which it always eventually does.

The question I am actually struggling with, day to day, is much more practical: how do I evaluate a junior developer in this sort of world?

The classic move was a take-home task. Build a small feature. Show me your thinking. The problem is that a capable model will produce a perfectly clean solution to any reasonable take-home in a few minutes. What you see in the submission tells you almost nothing about what the candidate actually understands. It tells you they can prompt well, which is a real skill, but it is not the skill I am trying to measure.

I can also ask them to solve a task while they are in our offices, so I can verify no AI use. But that is also stupid; I want them to use AI. After all, that is a great productivity enhancer. So I need a way to test understanding, not just the output.

The signals I care about are the ones that are hardest to fake in an agent-assisted world. Can you debug something when the model is wrong? Can you explain why a piece of generated code is subtly unsafe, or slow, or wrong in a way that only matters at the hundredth user? Can you make a reasoned call about which abstraction to reach for and which one to reject? When the system behaves unexpectedly, do you know where to look?

At the same time, those aren’t usually qualities that you can look for in a junior developer. Having those qualities usually means that they aren’t junior anymore.

People used to train on LeetCode tests as a way to show how good they were in interviews. That was a good stand-in to see what they knew and understood. What is the next stage here?

What does a junior do to exercise their skills and show that they can bring value to the team? I don’t know if I have good answers to those questions. But that is something we, as an industry, need to consider carefully.

I do not want to be the old man yelling at the cloud. The tools are genuinely great, and refusing to use them is its own kind of malpractice. AI coding agents can make you meaningfully more productive.

But when I talk to developers just starting out, the thing I keep pushing is this: use the tools, and also, on a regular basis, go down a layer. Set up a server yourself. Deploy something without a platform holding your hand. Read the DNS records. Look at what your framework is actually generating. Write something in a language without a package manager that hides the sharp edges.

Not because you will do it that way at work. But because the next time something breaks in a way the agent cannot fix you will have a mental model to fall back on. You will know where the seams are. You will know what to look at.

That mental model is, I suspect, going to be the thing that separates the engineers who compound over a career from the ones who get stuck the first time the abstraction leaks.

May 01 2026

The GPU Is the New Bangalore

time to read 5 min | 817 words

Tweet Share Share 0 comments

Tags:

In the 2000s, the hottest move in software was offshoring. You'd ship your requirements to a development shop in India, Vietnam, or Bangladesh, pay a fraction of Western developer rates, and wait. The cost savings were real, every spreadsheet said so. The failure modes were also real, every CTO said so.

Even assuming that the teams working on your code were smart, motivated, and hardworking, the distance, communication overhead, the time zone mismatch, and misaligned incentives created a brutal set of constraints. If you wanted to get good results from offshoring, you needed to be able to clearly specify what you wanted and be good at validating that you got what you expected.

You couldn't just say "I need a login system." You had to write detailed specs, break work into reviewable chunks, define acceptance criteria, and actually read the code that came back. Not rubber-stamp it. Read it, make sure that it passed muster and could be accepted internally, because the delta between "looks right" and "is right" could cost you six months of production incidents.

Sound familiar? Today, instead of shipping my requirements to a dev shop overseas, I'm shipping them to a GPU somewhere. I get something back. It looks like code. It might be code. It might be a very convincing facsimile of code that will quietly fail in production under load. I genuinely don't know until I sit down and read it carefully.

The same discipline that separated successful offshore engagements from expensive disasters applies here as well:

Specification quality determines output quality. Vague prompts return vague code. The ability to articulate exactly what you want — at the right level of abstraction — is now a core engineering skill.
Validation is non-negotiable. "It passed the vibe check" is not a code review. The reviewer needs to understand what the code is doing and why, not just that it compiles and the tests are green.
Iterative delivery beats big-bang delivery. Nobody who survived offshoring tried to outsource an entire product in one shot. You stage it. You review at each stage. You course-correct before mistakes compound.

The Bottleneck Has Moved

Here's what I think is the deeper shift: for most of software history, the bottleneck was writing the code. That took time and required expensive humans. So the industry optimized heavily around it, better editors, better frameworks, and better abstractions. All in service of making the act of writing code faster and less error-prone.

That bottleneck is collapsing. What once took six months might take six hours. When the cost of implementation approaches zero, the bottleneck moves upstream: to design, specification, and verification. The expensive parts are now:

Understanding the problem clearly enough to describe it precisely.
Decomposing it into well-scoped, independently verifiable pieces.
Reviewing what comes back and actually understanding it.

These are skills we largely deprioritized during the era when coding itself was the hard part. They're about to become the most valuable things a technical person can do.

A lot of that used to be done “along the way” when you wrote the code. You would explore the problem and gain depth of understanding as you wrote the code. Now that just doesn’t happen, but you still need to do that work explicitly.

A note about the importance of proper architecture

There is this idea that the path to building big systems with AI is to spin up a swarm of specialized agents (a frontend agent, a backend agent, a database administrator agent, etc.) and somehow orchestrate them into a coherent product.

I find this baffling, because we already have a well-established protocol for coordinating the work of specialized, partially independent contributors on a complex system. It's called software design.

Module boundaries. Interface contracts. Separation of concerns. Dependency management. SOLID principles and more. These patterns exist precisely because complex systems built by multiple contributors without clear interfaces turn into unmaintainable messes. This is true whether those contributors are humans, offshore teams, or language models.

The instinct to throw orchestration complexity at a coordination problem is exactly backwards. The answer isn't a smarter message bus between your agents. The answer is better system design that minimizes how much the pieces need to talk to each other in the first place.

We have literally decades of experience in how to build large software systems (and thousands of years of experience in how to handle large projects in general). There isn’t anything inherently new here to deal with.

The developers who will thrive in this environment aren't necessarily the ones who write the most elegant code. They're the ones who can hold a complex system design in their head and communicate it clearly, break the work into well-specified, verifiable increments, and actually read the code that comes back and hold it to a real standard of quality.

These are, in large part, the same skills that made the best engineering leads effective during the offshoring era. The context has changed completely. The discipline hasn't.

The GPU is the new Bangalore. Time to dust off the playbook.

Apr 29 2026

Putting Claude up against our test suite

time to read 6 min | 1001 words

Tweet Share Share 0 comments

Tags:

I’m convinced that in hell, there is a special place dedicated to making engineers fix flaky tests.

Not broken tests. Not tests covering a real bug. Flaky tests. Tests that pass 999 times out of 1000 and fail on the 1,000th run for no reason you can explain with a clean conscience.

If you've ever shipped a reasonably complex distributed system, you know exactly what I'm talking about. RavenDB has, at last count, over 32,000 tests that are run continuously on our CI infrastructure. I just checked, and in the past month, we’ve had hundreds of full test runs.

That is actually a problem for our scenario, because with that many tests and that many runs, the law of large numbers starts to apply. Assuming we have tests that have 99.999% reliability, that means that 1 out of every 100,000 test runs may fail. We run tens of millions of those tests in a month.

In a given week, something between ten and twenty of those tests will fail. Given the number of test runs, that is a good number in percentage terms. But each such failure means that we have to investigate it.

Those test failures are expensive. Every ticket is a developer staring at logs, trying to figure out whether this is a genuine bug in the product, a bug in the test itself, or something broken in the environment. In almost all cases, the problem is with the test itself, but we have to investigate.

A test that consistently fails is easy to fix. A test that occasionally fails is the worst.

With a flaky test, you don't just fix something and move on. You spend two days isolating it. Reproducing it. Building a mental model of a race condition that only manifests under specific timing, load, and cosmic alignment.

The tests that do this are almost always the integration tests. The ones that test complex distributed behavior across many parts of the system simultaneously. By definition, they are also the hardest to reason about.

The fact that, in most cases, those test failures add nothing to the product (i.e., they didn’t actually discover a real bug) is just crushed glass on top of the sewer smoothie. You spend a lot of time trying to find and fix the issue, and there is no real value except that the test now consistently passes.

We have a script that runs weekly, collects all test failures, and dumps them into our issue tracker. This is routine maintenance hygiene, to make sure we stay in good shape.

I was looking at the issue tracker when the script ran, and the entire screen lit up with new issues.

Just looking at that list of new annoyances was enough to ruin my mood.

And then, without much deliberate planning, I did something dumb and impulsive: I copy-pasted all of those fresh issues into Claude and told it to fix them. Then I went and did other things. I had very low expectations about this, but there was not much to lose.

A few hours later, I got a notification about a pull request. To be honest, I expected Claude to mark the flaky tests as skipped, or remove the assertions to make them pass.

I got an actual pull request, with real fixes, to my shock. Some of them were fixes applied to test logic. Some were actually fixes in the underlying code.

And then there was this one that stopped me cold. Claude had identified that in one of our test cases, we were waiting on the wrong resource. Not wrong in an obvious way — wrong in the kind of way that works perfectly 99.9998% of the time and silently fails 0.0002% of the time.

The (test) code looked right. We were waiting for something to happen; we just happened to wait on the wrong thing, and usually the value we asserted on was already set by the time we were done waiting.

Claude found it. In one pass. For the price of a subscription I was already paying. For reference, that single “let me throw Claude at it” decision probably saved enough engineering time to cover the cost of Claude for the entire team for that month.

Let me be precise about what happened and what didn't. Claude did not fix everything. Some of the "fixes" it produced were pretty bad, surface-level patches that didn't address the real cause, or things that were legitimately out of scope.

You still need an engineer reviewing the output. And you still need judgment.

But it got things fixed, quickly, without needing two days to context-switch into the problem space. And the things it did fix well, it fixed really well.

The work it compressed would have realistically taken one developer a week or two to grind through — and that's assuming you could get a developer to focus on it for that long in the first place. Flaky test investigation is the kind of work that quietly kills team morale.

Engineers start dreading CI. They start treating red builds as background noise. That's how quality degrades silently. Leaving aside new features or higher velocity, being able to offload the most annoying parts of the job to a machine to do is… wow.

Based on this, we're building this into our actual workflow as an integral part of how we handle test maintenance. Failures are collected, routed to Claude, and it takes a first pass at triage and repair. Then we create an issue in the bug tracker with either an actual fix or a summary of Claude’s findings.

By the time a human reviews this, significant progress has already been made.

It doesn't replace the engineer. But it means the engineer is doing the interesting part of the work: judgment, review, architectural reasoning. Skipping the part that requires staring at race condition logs until your vision blurs.

This isn’t the most exciting aspect of using a coding agent, I’m aware. But it may be one of the best aspects in terms of quality of life.

Apr 27 2026

15+ years of working with coding agents

time to read 6 min | 1198 words

Tweet Share Share 0 comments

Tags:

No, the title is not a mistake, nor did I use my time travel pass to give you insights from the future. Bear with me for a moment while I explain my thinking.

From individual contributor to oversight role

I started writing RavenDB in a spare bedroom, which turned into an office. The project grew from a sparkle in my head that wouldn’t let me sleep into a major project in very short order.

Today, I want to talk about a pretty important stage that happened during that growth phase. Somewhere between having five and ten full-time developers working on RavenDB, I lost the ability to keep track of every single line of code that was going into the project.

I had been the primary developer for years at this point, I wrote the majority of the code, and I was the person making all the key decisions in the project. And then, gradually, I… wasn't that guy anymore.

There were too many moving parts, too many developers, too many decisions happening in parallel for me to have my hands on all of it. That was the whole point of growing the team, dividing the tasks among the team members, and getting good people to do things so I didn’t have to do it all myself.

What I didn't expect was how much it would bother me. Moving from being the primary developer to a supervisory role didn’t mean that I lost the ability to write code. In fact, in many cases, I could “see” what the solution for each issue should be.

I just didn’t have the time to do that, nor the capacity to sit with every single developer on every single issue and craft the right way to solve it. I'd hand a feature to a developer knowing that the way they were going to handle it would not be mine.

That doesn’t mean it would be wrong, but it wouldn’t be the same. It might need a review cycle or two to get to the right level for the product, or they wouldn’t consider how it fits into the grand scheme of things, etc.

And let’s not talk about the time estimates I got. I’m willing to assume that my personal timing estimates are highly subjective and influenced by my deep familiarity with the codebase.

But still. Multiple days for something that felt like it should be a two-hour job was hard to sit with.

I carried around a background level of frustration for quite some time. It killed me that the pace of development wasn’t up to what I wanted it to be. “If I could just have the time to sit and write this”, I kept thinking, “we would be done by the end of the week.”

There was progress, to be clear, but nothing was moving fast enough. Everywhere I looked, we had stalled.

And then something happened. It didn’t happen all at once, but in the space of a month or two, features started to land. Each team had been heads-down on something for quite a while, and by some coincidence of timing, they all finished around the same time.

Suddenly, we moved from “we have nothing to ship” to “we can’t have so many new features all at once”. I realized that I would be able to ship things faster, for sure. I could do two new features, maybe even three, in that same time frame. That would require head-down coding for the entire duration, of course.

Reading that last paragraph again, I have to admit that I may be letting some hubris color my perception 🤷😏.

I wouldn’t be able to deliver the sheer quantity of features that the team was able to deliver.

What had felt like months of stagnation turned out to be parallelism in action.

Yes, some of the code wasn't the same code that I would write. And some of the architectural decisions weren't the ones I'd have made. That didn’t make them wrong, mind. And those developers were working on things I was not working on. And the sum total of what got built was something I could never have done solo.

Treating coding agents as junior developers?

I think about that experience constantly now, because I'm living a version of it again, except the new team member is Claude. Working with AI coding agents today feels remarkably like working with a junior developer who is also a savant.

They've read everything. They know an enormous amount. They can produce working code quickly and confidently across a staggering range of domains. And yet they're also genuinely ignorant in ways that will surprise you: missing context, misreading intent, optimizing for the wrong thing, occasionally producing something that is confidently and completely broken.

This is not a criticism. This is just what it's like. And I've dealt with this before. There are clear parallels between mentoring junior engineers and looking at the output from an AI agent.

There is an assumption that you need to get perfect output from a coding agent. But you are not likely to get perfect output from a human developer. Even experienced developers benefit greatly from reviews, guidance, etc. Junior developers need more of that, of course, but they can still bring value, even if their output goes through several iterations.

For coding agents to bring real value, you need to consider them in the same light.

The shift that happened with my developer team is the same shift that's happening now with AI agents.

Instead of writing every line yourself, you start spending time on the bigger picture: here's the overall direction, here's the architectural constraint, here's what done looks like. Then you review the outputs.

Talking to a coding agent is a little different from discussing a feature with a dev and reviewing their code days later, except that the agent delivers the output in the time it takes to get coffee.

The fact that this cycle is done in a short amount of time means that you still have all the knowledge in your head. You can catch drift before it becomes technical debt.

The cost of going in the wrong direction is greatly reduced, which means that you can be far more radical about how you approach these tasks.

Unnatural impulses as a developer

I wonder if a lot of developers are facing challenges in this area specifically because they don’t have the managerial experience needed for this new aspect of the work.

I have been writing code with Claude recently. And the short feedback cycle means that I’m loving it. I'm not abdicating the technical judgment, mind. I'm applying it differently.

I'm writing the high-level design, not the implementation. I'm doing the review, not the first draft. And I'm being honest with myself that the output, while it isn’t always what I would write, is covering ground I simply would not have covered otherwise.

I have been doing this for a long time and it feels quite natural. I also remember that this was a difficult transition for me at the time.

For those who want to better understand how they can get the most value from coding agents, you are probably better off looking into project management theory rather than optimizing your agents.md file.

Apr 23 2026

Expertise in the age of AI, or: Matt's Claude'll handle this

time to read 4 min | 650 words

Tweet Share Share 0 comments

Tags:

One of our team leads has been working on a major feature using Claude Code. He's been at it for a few days and is nearly done. To put that in context: this feature would normally represent about a month of a senior developer's time.

He did the backend work himself — working with Claude to build it out, applying his knowledge of how the system should behave, reviewing, adjusting, and iterating. He handled only the backend, and when I asked him about the frontend, he said: "I'm going to let Matt’s Claude handle that."

Context: Matt is the frontend team lead.

Note the interesting phrasing. He didn't say "I'll do the UI later" or "Claude’ll handle the UI." He deferred to the frontend lead who has the domain expertise to drive that part.

That's not a throwaway comment. That's an important statement about how work should be divided in the age of AI agents.

Here's the thing: I've told Claude to build a UI for a feature, pointed it at the codebase, and it figured out how the frontend is structured, what patterns we use, and generated something I could work with. It wasn’t a sketch or a wireframe diagram, it was actually usable.

I got a functional UI from Claude in less time than it would take to write up the issue describing what I want.

That UI was enough for me to explore the feature, do a small demo, etc. I’m not a frontend guy, and I didn’t even look at the code, but I assume that the output probably matched the rest of our frontend code.

We won’t be using the UI Claude generated for me, though. The gap in polish between what I got and what a real frontend developer produces is enormous. I got something I could play with, but it was very evident that it wasn’t something that had received real attention.

For the time being, it was more than sufficient. The problem is that even leaning heavily on AI, the investment of time for me to do it right would be significant. I'd need to understand our frontend architecture, our conventions, our component library, how state flows, and what our designers expect. All of that would take real time, even with an AI doing most of the code generation.

That is leaving aside the things that I don’t know about frontend that I wouldn’t even realize I need to handle. I wouldn’t even know what to ask the AI about, even if it could do the right thing if I sent it the right prompt.

Contrast that with the frontend team. They know the architecture of the frontend, of course, and they know how things should slot together and what concerns they should address. They know when Claude's suggestion is on the right track and when it's going to create a mess three layers down. Effectively, they know the magic incantation that the agent needs in order to do the right thing.

What does this say about AI usage in general? Given two people with the same access to a smart coding agent like Claude or Codex, both performing the same task, their domain knowledge will lead to very different results. In other words, it means that Claude and its equivalents are tools. And the wielder of the tool has a huge impact on the end result.

The role of expertise hasn't diminished. It's shifted. The expert is no longer the person who can produce the artifact. They're the person who can direct the production of the artifact correctly and efficiently. That's a different skill profile, but it's no less valuable and the leverage is higher.

We're still figuring out what this means structurally. But the instinct to say "that's not my domain, let the person who knows it handle the AI that does it" is correct. Domain knowledge determines the quality of the output, even when the AI is doing all the typing.

Apr 21 2026

Using AI agents in long-lived software projects

time to read 6 min | 1146 words

Tweet Share Share 2 comments

Tags:

You read the story a hundred times: “I told Codex (or Claude, or Antigravity, etc.) to build me a full app to run my business, and 30 minutes later, it’s done”. These types of stories usually celebrate the new ecosystem and the ability to build complex systems without having to dive into the details.

The benchmarks celebrate "one-shotting" entire applications, as if that's the relevant metric. I think this is the wrong framing entirely. Mostly because I care very little about disposable software, stuff that you stop using after a few days or a week. I work on projects whose lifetime is measured in decades.

AI agent-driven development isn't about the ability to use a one-shot prompt to generate a full-blown app that matches exactly what the user wants. That is a nice trick, but nothing more, because after you generate the application, you need to maintain it, add features (and ensure stability over time), fix bugs, and adjust what you have.

The process of using AI agents to build long-lived applications is distinctly different from what I see people bandying about. I want to dedicate this post to discussing some aspects of using AI agents to accelerate development in long-lived software projects.

Code quality only matters in the long run

The key difference between one-off work and long-lived systems is that we don’t care about code quality at all for the one-off stuff. It's a throwaway artifact. Run it, get your answer, move on. I am usually not even going to look at the code that was generated; I certainly don’t care how it is structured.

If I need to make any changes, or have to come back to it in six months, it is usually easier to just regenerate the whole thing from scratch rather than trying to maintain or evolve it.

When you're talking about an application that will live for a decade or more - or worse, an existing application with decades of accumulated effort baked into it - what happens then? The calculus changes completely. How do you even begin to bring AI into that kind of system?

It turns out that proper software architecture becomes more relevant, not less.

Software architecture as context management for AI

Think about what good software architecture actually gives you: components, layers, clear boundaries, and well-defined responsibilities. The traditional justification is that this lets you make small, careful, targeted changes. You know where to go, and you can change one thing. You slowly evolve things over time. Your changes don't break ten others because not everything is intermingled.

Now think about how an AI operates on a codebase. It works within a context window. That constraint isn't unique to AI, people do that too. There is only so much you can keep in your head, and proper architecture means that you are separating concerns so you can work with just the relevant details in mind.

When your architecture is clean, the AI can focus on exactly the right piece of the system. When it isn't, you're either feeding the AI irrelevant noise or hiding the context it actually needs from it.

Good architecture, it turns out, is also a good AI interface. And the reason this works is the same as for people: it reduces the cognitive load you have to carry while understanding and modifying the system. For AI, we just call it the context window. For people, it is cognitive load. Same term, same concept.

Beyond the mechanical benefits, good architecture gives you two things that I think are underappreciated in this conversation.

The first is structural comprehension. You don't need to have every line of a large codebase in your head. But you do need a genuine mental model of how data flows, how components relate, and where things live. That's only possible if the architecture actually reflects the system's intent.

When using AI to generate code, you need to have a proper understanding of the flow of the system. That allows you to look at a pull request and understand the changes, their intent, and how they fit into the greater whole. Without that, you can't meaningfully review the code. You're just rubber-stamping diffs you don't have a hope of understanding.

The second is that the work has shifted. We're moving from "how do I write this code?" to "how do I review all of this code?". Nobody is going to meaningfully maintain 30,000 lines a day of dense AI code. At that point, the codebase has escaped human comprehension, and you've lostthe game. This isn’t your project anymore, and sooner or later, you’ll face the Big Decision.

Turtles all the way down

I hear the proposed solution constantly: "I have an agent that writes the code, an agent that tests it, an agent that reviews the reviews, and so on." This is, I think, genuinely insane for anything that matters.

We already have evidence from the field that this doesn’t work. Amazon has had production failures from AI-generated code produced through exactly these kinds of layered-AI pipelines. Microsoft's aggressive approach to AI integration has shown what happens when AI-generated code enters production with minimal meaningful human oversight.

In both of those cases, the “proper oversight” was also provided by AI. And the end result wasn’t encouraging for this pattern of behavior. For critical systems that carry real consequences, "AI supervising AI" is not a thing.

AI works when you treat it as a tool in your hands, not as an autonomous system you've delegated to. An engineer who understands architecture and can look at a diff and say "this is right" or "this is wrong, and here's why" is much more capable with AI than without it.

An engineer who has offloaded comprehension to the machine is flying blind; worse, they are flying very fast directly into a cliff wall.

What should you do about it?

When we treat AI agents as a tool, it turns out that not all that much needs to change. The current processes you have in place (CI/CD, testing, review cycles, etc.) are all about being able to generate trust in the new code being written. Whether a human wrote it or a GPU did is less interesting.

At the same time, we have decades of experience building big systems. We know that a Big Ball of Mud isn’t sustainable. We know that proper architecture means breaking the system into digestible chunks. Yes, with AI you can throw everything together, and it will sort of work for a surprisingly long time. Until it doesn’t.

With a proper architecture, the scope you need to keep track of is inherently limited. That allows you to evolve over time and make changes that are inherently limited in scope (thus, reviewable, actionable, etc.).

“The more things change, the more they stay the same.” It is a nice saying, but it also carries a fundamental truth. Using AI doesn’t absolve us from the realities on the ground, after all.

Apr 17 2026

Agents, Code Reviews, and the Bottleneck Shift, Oh My!

time to read 5 min | 813 words

Tweet Share Share 0 comments

Tags:

Like everything else, we have been using AI in various forms for a while now, from asking ChatGPT to write a function to asking it to explain an error, then graduating to running it on our code in the IDE, and finally to full-blown independent coding assistants.

Recently, we shifted into a much higher gear, rolling it out across most of the teams at RavenDB. I want to talk specifically about what that looks like in practice in real production software.

RavenDB is a mature codebase, with about 18 years of history behind it. The core team is a few dozen developers working on this full-time. We also care very deeply about correctness, performance, and maintainability.

With all the noise about Claude, Codex, and their ilk recently, we decided to run some experiments to see how we can leverage them to help us build RavenDB.

The numbers that got my attention

We started with features that were relatively self-contained — ambitious enough to be real work, but isolated enough that an AI agent could take them end-to-end without stepping on core aspects of RavenDB.

The first one was estimated at about a month of work for a senior developer. We completed it in two days. To be fair, a significant portion of that time was spent learning how to work effectively with Claude as an agent, learning the ropes and the right discipline and workflows, not just the task itself.

The second was estimated at roughly three months for an initial version. It was delivered in about a week. And we didn't just hit the target — we significantly exceeded the planned feature set.

In terms of efficiency, we are talking about a proper leap from what we previously could expect.

This isn't vibe coding

I want to be direct about something: this is not "prompt it and ship it." There is a discipline required here. The AI can move very fast, explore a lot of ground, and generate code that looks right, but isn’t. Code ownership and engineering responsibility don't go away; they become much more demanding.

I personally sat and read 30,000 lines of code. I had to understand what was there, push back on decisions, redirect the approach, and enforce the standards that RavenDB has built up over many years.

Those 30,000 lines of code didn’t appear out of thin air. They were the final result of a lot of planning, back and forth with the agent, incremental steps in the right direction (and many wrong ones, etc.).

To be fair, 30,000 lines of code sounds like a lot, right? About 60% of that is actually tests, and about half of the remaining code is boilerplate infrastructure that we need to have, but isn’t really interesting.
The juicy parts are only around 5,000 lines or so.

In many respects, this isn’t prompt-and-go but feels a lot more like a pair programming session on steroids.

What AI agents give you is the ability to explore the problem space cheaply and quickly. After we had something built, I had a different idea about how to go about implementing it. So I asked it to do that, and it gave me something that I could actually explore.

Being able to evaluate multiple different approaches to a solution is crazy valuable. It is transformative for architectural decisions.

Having said that, using a coding agent to take all the boilerplate stuff meant that I was able to focus on the “fun parts”, the pieces that actually add the most value, not everything else that I need to do to get to that part.

What this means going forward

AI agents are going to amplify your existing engineering culture, for better or worse.

A lot of the cost of writing good software is going to move from actually writing code to reviewing it. For many people, the act of writing the code was also the part where they thought about it most deeply.

Now the thinking part moves either upfront, at the planning phase, or to the end, when you look at the pull request. Reading a pull request, you could reasonably expect to see code that has already been reasoned about and properly tamed.

Now, in some cases, this is the first time that a human is actually going to properly walk through the whole thing. To ensure proper quality, you also need to shift a lot of your focus to that part.

The bottleneck for good software is going to be the review cycle, the architectural approach, and an experienced team that can actually evaluate the output and ensure consistent high quality.

Without that, you can go very fast, but just generating code quickly is a losing proposition. You’ll go very fast directly into a painful collision with a wall.

We are still settling down and trying to properly understand the best approach to take, but I have to say that this experiment was a major success.

Feb 27 2026

The 'Million AI Monkeys' Hypothesis & Real-World Projects

time to read 8 min | 1542 words

Tweet Share Share 1 comments

Tags:

architecture

I have run into this post by John Rush, which I found really interesting, mostly because I so vehemently disagree with it. Here are the points that I want to address in John’s thesis:

1. Open Source movement gonna end because AI can rewrite any oss repo into a new code and commercially redistribute it as their own.
2. Companies gonna use AI to generate their none core software as a marketing effort (cloudflare rebuilt nextjs in a week).

Can AI rewrite an OSS repo into new code? Let’s dig into this a little bit.

AI models today do a great job of translating code from one language to another. We have good testimonies that this is actually a pretty useful scenario, such as the recent translation of the Ladybird JS engine to Rust.

At RavenDB, we have been using that to manage our client APIs (written in multiple languages & platforms). It has been a great help with that.

But that is fundamentally the same as the Java to C# converter that shipped with Visual Studio 2005. That is 2005, not 2025, mind you. The link above is to the Wayback Machine because the original link itself is lost to history.

AI models do a much better job here, but they aren’t bringing something new to the table in this context.

Claude C Compiler

Now, let’s talk about using the model to replicate a project from scratch. And here we have a bunch of examples. There is the Claude C Compiler, an impressive feat of engineering that can compile the Linux kernel.

Except… it is a proof of concept that you wouldn’t want to use. It produces code that is significantly slower than GCC, and its output is not something that you can trust. And it is not in a shape to be a long-term project that you would maintain over the years.

For a young project, being slower than the best-of-breed alternative is not a bad thing. You’ve shown that your project works; now you can work on optimization.

For an AI project, on the other hand, you are in a pretty bad place. The key here is in terms of long-term maintainability. There is a great breakdown of the Claude C Compiler from the creator of Clang that I highly recommend reading.

The amount of work it would require to turn it into actual production-level code is enormous. I think that it would be fair to say that the overall cost of building a production-level compiler with AI would be in the same ballpark as writing one directly.

Many of the issues in the Claude C Compiler are not bugs that you can “just fix”. They are deep architectural issues that require a very different approach.

Leaving that aside, let’s talk about the actual use case. The Linux kernel’s relationship with its compiler is not a trivial one. Compiler bugs and behaviors are routine issues that developers run into and need to work on.

See the occasional “discussion” on undefined behavior optimizations by the compiler for surprisingly straightforward code.

Cloudflare’s vinext

So Cloudflare rebuilt Next.js in a week using AI. That is pretty impressive, but that is also a lie. They might have done some work in a week, but that isn’t something that is ready. Cloudflare is directly calling this highly experimental (very rightly so).

They also have several customers using it in production already. That is awesome news, except that within literal days of this announcement, multiple critical vulnerabilities have been found in this project.

A new project having vulnerabilities is not unexpected. But some of those vulnerabilities were literal copies of (fixed) vulnerabilities in the original Next.js project.

The issue here is the pace of change and the impact. If it takes an agent a week to build a project and then you throw that into production, how much real testing has been done on it? How much is that code worth?

John stated that this vinext project for Cloudflare was a marketing effort. I have to note that they had to pay bug bounties as a result and exposed their customers to higher levels of risk. I don’t consider that a plus. There is also now the ongoing maintenance cost to deal with, of course.

The key here is that a line of code is not something that you look at in isolation. You need to look at its totality. Its history, usage, provenance, etc. A line of code in a project that has been battle-tested in production is far more valuable than a freshly generated one.

I’ll refer again to the awesome “Things You Should Never Do” from Spolsky. That is over 25 years old and is still excellent advice, even in the age of AI-generated code.

NanoClaw’s approach

You’ve probably heard about the Clawdbot ⇒ Moltbot ⇒ OpenClaw, a way to plug AI directly into everything and give your CISO a heart attack. That is an interesting story, but from a technical perspective, I want to focus on what it does.

A key part of what made OpenClaw successful was the number of integrations it has. You can connect it to Telegram, WhatsApp, Discord, and more. You can plug it into your Gmail, Notes, GitHub, etc.

It has about half a million lines of code (TypeScript), which were mostly generated by AI as well.

To contrast that, we have NanoClaw with ~500 lines of code. Not a typo, it is roughly a thousand times smaller than OpenClaw. The key difference between these two projects is that NanoClaw rebuilds itself on the fly.

If you want to integrate with Telegram, for example, NanoClaw will use the AI model to add the Telegram integration. In this case, it will use pre-existing code and use the model as a weird plugin system. But it also has the ability to generate new code for integrations it doesn’t already have. See here for more details.

On the one hand, that is a pretty neat way to reduce the overall code in the project. On the other hand, it means that each user of NanoClaw will have their own bespoke system.

Contrasting the OpenClaw and NanoClaw approaches, we have an interesting problem. Both of those systems are primarily built with AI, but NanoClaw is likely going to show a lot more variance in what is actually running on your system.

For example, if I want to use Signal as a communication channel, OpenClaw has that built in. You can integrate Signal into NanoClaw as well, but it will generate code (using the model) for this integration separately for each user who needs it.

A bespoke solution for each user may sound like a nice idea, but it just means that each NanoClaw is its own special snowflake. Just thinking about supporting something like that across many users gives me the shivers.

For example, OpenClaw had an agent takeover vulnerability (reported literally yesterday) that would allow a simple website visit to completely own the agent (with all that this implies). OpenClaw’s design means that it can be fixed in a single location.

NanoClaw’s design, on the other hand, means that for each user, there is a slightly different implementation, which may or may not be vulnerable. And there is no really good way to actually fix this.

Summary

The idea that you can just throw AI at a problem and have it generate code that you can then deploy to production is an attractive one. It is also by no means a new one.

The notion of CASE tools used to be the way to go about it. The book Application Development Without Programmers was published in 1982, for example. The world has changed since then, but we are still trying to get rid of programmers.

Generating code quickly is easy these days, but that just shifts the burden. The cost of verifying code has become a lot more pronounced. Note that I didn’t say expensive. It used to be the case that writing the code and verifying it were almost the same task. You wrote the code and thus had a human verifying that it made sense. Then there are the other review steps in a proper software lifecycle.

When we can drop 15,000 lines of code in a few minutes of prompting, the entire story changes. The value of a line of code on its own approaches zero. The value of a reviewed line of code, on the other hand, hasn’t changed.

A line of code from a battle-tested, mature project is infinitely more valuable than a newly generated one, regardless of how quickly it was produced. The cost of generating code approaches zero, sure.

But newly generated code isn’t useful. In order for me to actually make use of that, I need to verify it and ensure that I can trust it. More importantly, I need to know that I can build on top of it.

I don’t see a lot of people paying attention to the concept of long-term maintainability for projects. But that is key. Otherwise, you are signing up upfront to be a legacy system that no one understands or can properly operate.

Production-grade software isn’t a prompt away, I’m afraid to say. There are still all the other hurdles that you have to go through to actually mature a project to be able to go all the way to production and evolve over time without exploding costs & complexities.

Feb 25 2026

AI & the movie Eagle Eye

time to read 2 min | 203 words

Tweet Share Share 0 comments

miscellaneous, development

In 2008, the movie Eagle Eye came out. I remember watching that at the time and absolutely loving this movie. It is an action movie, so enjoying it once is the sole criteria that I have. Surprisingly, I got flashbacks of this movie repeatedly in the past few weeks.

I think it is safe to talk about “spoilers” for a movie that is old enough to drive, so the core idea in this movie is that an AI wants to perform a certain action, but is prevented from doing so. It then comes up with a pretty convoluted approach to bypassing those limits. I’m intentionally vague here, because the movie is actually good and you should watch it.

The key here, which is the reason that I remember an 18 years old movie, is that we are actually seeing this behavior today with AI agents. It is an entirely relatable phenomenon to see the agent running into an obstacle, and then trying to bypass it using crazier and crazier techniques.

The movie aged particularly well in this regard, because what was a plot device in there is a daily occurrence in our lives now. For reference, see this Tweet.

Feb 05 2026

The hole in my falloaction

time to read 9 min | 1674 words

Tweet Share Share 0 comments

Tags:

I am working a bit with sparse files, and I need to output the list of holes in my file.

To my great surprise, I found that my file had more holes than I put into it. This probably deserves a bit of explanation.

If you know what sparse files are, feel free to skip this explanation:

A sparse filereduces disk space usage by storing only the non-zero data blocks.Zero-filled regions ("holes") are recorded as file system metadata only.
The file still has the same “size”, but we don’t need to dedicate actual disk space for ranges that are filled with zeros, we can just remember that there are zeros there. This is a natural consequence of the fact that files aren’t actually composed of linear space on disk.
Filesystems grow files using extents (contiguous disk chunks).A file initially gets a single extent (e.g., 1MB).Fast I/O is maintained as sequential data fills this contiguous block.Once the extent is full, the filesystem allocates a new, separate extent (which will not reside next to the previous one, most likely).The file's logical size grows continuously, but physical allocation occurs in discrete bursts as new extents are dynamically added.
If you are old enough to remember running defrag, that was essentially what it did. Ensured that the whole file was a single continuous allocation on disk. Because of this, it is very simple for a file system to just record holes, and the only file system that you’ll find in common use today that doesn’t support it is FAT.

At any rate, I had a problem. My file has more holes than expected, and that is not a good thing. This is the sort of thing that calls for a “Stop, investigate, blog” reaction. Hence, this post.

Let’s see a small example that demonstrates this:

#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>


int main()
{
    const off_t file_size = 1024LL * 1024 * 1024;
    int fd = open("test-sparse-file.dat", O_CREAT | O_RDWR | O_TRUNC, 0644);
    fallocate(fd, 0, 0, file_size);
    
    off_t offset = 0;
    while (offset < file_size) {
        off_t hole_start = lseek(fd, offset, SEEK_HOLE);
        if (hole_start >= file_size) break;
        
        off_t hole_end = lseek(fd, hole_start, SEEK_DATA);
        if (hole_end < 0) hole_end = file_size;
        
        printf("Start: %.2f MB, End: %.2f MB\n", 
               hole_start / (1024.0 * 1024.0),
               hole_end / (1024.0 * 1024.0));
        
        offset = hole_end;
    }
    
    close(fd);
    return 0;
}

If you run this code, you’ll see this surprising result:

Start: 0.00 MB, End: 1024.00 MB

In other words, even though we just use fallocate() to ensure that we reserved the disk space, as far as lseek() is concerned, it is just one big hole. What is going on here?

Let’s dig a little deeper, using filefrag:

$ filefrag -b1048576 -v test-sparse-file.dat 
Filesystem type is: ef53
File size of test-sparse-file.dat is 1073741824 (1024 blocks of 1048576 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      23:     165608..    165631:     24:             unwritten
   1:       24..     151:     165376..    165503:    128:     165632: unwritten
   2:      152..     279:     165248..    165375:    128:     165504: unwritten
   3:      280..     407:     165120..    165247:    128:     165376: unwritten
   4:      408..     535:     164992..    165119:    128:     165248: unwritten
   5:      536..     663:     164864..    164991:    128:     165120: unwritten
   6:      664..     791:     164736..    164863:    128:     164992: unwritten
   7:      792..     919:     164608..    164735:    128:     164864: unwritten
   8:      920..    1023:     164480..    164583:    104:     164736: last,unwritten,eof
test-sparse-file.dat: 9 extents found

You can see that the file is made of 9 separate extents. The first one is 24MB in size, then 7 extents that are 128MB each, and the final one is 104MB.

Amusingly enough, the physical layout of the file is in reverse order to the logical layout of the file. That is just the allocation pattern of the file system, since there is no relation between the two.

Now, let’s try to figure out what is going on here. Do you see the flags on those extents? It says unwritten. That means this is physical space that was allocated to the file, but the file system is aware that it never wrote to that space. Therefore, that space must be zero.

In other words, conceptually, this unwritten space is no different from a sparse region in the file. In both cases, the file system can just hand me a block of zeros when I try to access it.

The question is, why is the file system behaving in this manner? And the answer is that this is an optimization. Instead of reading the data (which we know to be zeros) from the disk, we can just hand it over to the application directly. That saves on I/O, which is quite nice.

Consider the typical scenario of allocating a file and then writing to it. Without this optimization, we would literally double the amount of I/O we have to do.

It turns out that this optimization also applies to Windows and Mac, but the reason I ran into that on Linux is that I used the lseek(SEEK_HOLE), which considers the unwritten portion as a sparse hole as well. This makes sense, since if I want to copy data and I am aware of sparse regions, I should treat the unwritten portions as holes as well.

You can use the ioctl(FS_IOC_FIEMAP) to inspect the actual file extents (this is what filefrag does) if you actually care about the difference.

Oren Eini

Oren Eini

CEO of RavenDB

Learning to code, 1990s vs 2026

Hidden costs for reduced conceptual depth

What is the role of a junior developer now?

The GPU Is the New Bangalore

The Bottleneck Has Moved

A note about the importance of proper architecture

Putting Claude up against our test suite

15+ years of working with coding agents

From individual contributor to oversight role

Treating coding agents as junior developers?

Unnatural impulses as a developer

Expertise in the age of AI, or: Matt's Claude'll handle this

Using AI agents in long-lived software projects

Code quality only matters in the long run

Software architecture as context management for AI

Turtles all the way down

What should you do about it?

Agents, Code Reviews, and the Bottleneck Shift, Oh My!

The numbers that got my attention

This isn't vibe coding

What this means going forward

The 'Million AI Monkeys' Hypothesis & Real-World Projects

Claude C Compiler

Cloudflare’s vinext

NanoClaw’s approach

Summary

AI & the movie Eagle Eye

The hole in my falloaction

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed