What kind of problems you’ll find only when you are dog fooding

time to read 4 min | 666 words

Just minutes after I posted the previous post about dog fooding and the issues this has, we run into major trouble. Actually, several issues.

The first of which was that we started to get authentication issues, but even though we upgraded the server, we didn’t change any security related configuration. So what was going on? I tried logging in from my own machine, using the same credentials as the server, and everything worked! I then tried replacing the credentials to ones I knew worked, because they were being used by another system with no issues, and that didn’t work either.

Looking in Fiddler, there were no 401 requests, either. Very strange.

This ended up being an issue of lazy requests. Basically, a lazy request is one that carry multiple requests in a single round trip for the server. We changed how we handle those internally, so they look pretty much like any other request for the code, but it appears that we also forced them to go through authorization again, and obviously that didn’t work. Once we knew what was going on, fixing this was very easy.

The next issue was a little crazier. Some things were failing, and we weren’t sure why. We got an illegal duplicate key from somewhere, but it made no sense. What was worse, it appear to be happening in random places. It took me a while to figure out the root cause of this issue. In RavenDB 2.5, we tracked indexes using the index name. In RavenDB 3.0, we are tracking indexes using numeric ids. However, the conversion process between 2.5 to 3.0 didn’t take that into account, and while it gave the existing indexes ids, it didn’t set the next index id to the correct value.  When we would try to create a new index, it would generate an index id that was already existing, and that failed.  The error could crop up when you run a dynamic query that had to create a new index, so that was kind of funky.

The last error, however, is something that I consider to be purely evil. RavenDB 3.0 added a limitation that an index cannot output more than a set amount of index entries per document. The idea is that we want to stop the fan out problem. An index with high fan out can consume a lot of memory and other resources without control.

The idea was that we want to explicitly stop that early on. But what about existing indexes? We added a way to increase the limit explicitly, but old indexes obviously wouldn’t have this option. The problem was that this is only triggered on indexing, so you start the application, and everything works. Then indexing starts to fail. Which is fine, but then we have another RavenDB feature that applies, and if an index has too many errors, then it would be marked as failing. Failed indexes throw when queried.

The reason that this is evil is because it actually takes quite a bit of time for this error to surface. So you run your tests, and everything works, and a few hours or days later, everything crashes.

All of those issues has been resolved. I found that the index id issue was actually properly there, but I appear to have removed it during an unrelated fix to the previous problem without noticing. The lazy requests now know that they are already authenticated and the maximum size when loading from an existing system is 32K, which is big enough for pretty much anything you can think of. The behavior when you exceed the max fan out is also more consistent, it will skip this document, and only if you have a lot of them will it actually disable the index.

And yes, pushing things to ayende.com and seeing what breaks is a pretty good testing strategy, egg on face and all.