Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,517 | Comments: 47,934

filter by tags archive

Optimizing select projectionsPart I

time to read 2 min | 240 words

I spoke about designing for performance in my previous post and I thought it would be in interesting series  of blog posts. The task I have is that we have a data source of some kind, in our case, I decided to make the task as simple as possible and define the source data as a Dictionary<string, object>, without allowing nesting of data.

We want the user to be able to provide a custom way to project data from that dictionary. Here are a few ways to do that:

And here is the data source in question:

Obviously the user can make the select projections as complex as they wish. But I think that these three sample give us a good representation of the complexity. We’ll also treat them a bit differently, with regards to their likelihood. So when testing, we’ll check that the first projection is run 10,000 times, then second projection is run 5,000 times and the last projection is run 1,000 times.

With the layout of the land settled, let us see how we can get this actually working. I’m going to use Jint and solve the issue in a very brute force manner.

Here is the initial implementation:

And this runs in 4.4 seconds on the 10K, 5K, 1K run.

  • 10K in 2,756 ms
  • 5K in 1,310 ms
  • 1K in 314 ms

I’m pretty sure that we can do better, and we’ll look at that in the next post.

Future proofing for crazy performance optimizations

time to read 2 min | 306 words

Image result for javascript object literal

I talked about the new query language and javascript performance quite a bit, but I wanted post about a conversation that we had at the office. It involves the following design decision:

image

We chose to use an object literal syntax as the complex projection mechanism for several reasons. First, it is easy to implement right now, since we can throw that to the JS engine and let it handle this. Second, it is very readable and given that we are a JSON document database, should be very familiar for users. But the third reason isn’t so obvious. It is using this syntax because we can quite easily optimize most common scenarios very easily.

I can take the object literal and parse that into an AST, and instead of sending that to a JS engine, I can execute that directly. Even if I’m going to implement a trivial AST traverser that can the most basic of operations, that would be enough to handle a lot of scenarios, and I don’t have to be 100%. I can implement an AST walked and run that, and if I find that there is something there that I can’t support I can just fallback to the JS engine.

Since my AST interpreter wouldn’t need any of the JS engine behaviors, it can be implemented much more efficiently and with greater focus. For that matter, I don’t have to limit myself to just interpreting this AST. I can also generate IL code on the fly to evaluate this instead, given me an even higher speed boost.

Basically, this syntax give us at least three levels of optimization opportunities to explore as needed.

With performance, test, benchmark and be ready to back out

time to read 5 min | 979 words

image

Last week I spoke about our attempt to switch our internal JS engine to Jurassic from Jint. The primary motivation was speed, Jint is an interpreter, while Jurassic is compiled to IL and eventually machine code. That is a good thing from a performance standpoint, and the benchmarks we looked at, both external nd internal, told us that we could expect anything between twice as fast and ten times as fast. That was enough to convince me to go for it. I have a lot of plans for doing more with javascript, and if it can be fast enough, that would be just gravy.

So we did that, we took all the places in our code where we were doing something with Jint and moved them to Jurrasic. Of course, that isn’t nearly as simple as it sounds. We have a lot of places like that, and a lot of already existing code. But we also took the time to do this properly, of making sure that there is a single namespace that is aware of JS execution in RavenDB and hide that functionality from the rest of the code.

Now, one of the things that we do with the scripts we execute is expose to them both functions that they can call and documents to look at and mutate. Consider the following patch script:

this.NumberOfOrders++;

This is on a customer document that may be pretty big, as in, tens of KB or higher. We don’t want to have to serialize the whole document into the JS environment and then serialize it back, that road lead to a lot of allocations and extreme performance costs. No, already with Jint we have implemented a wrapper object that we expose to the JS environment that would do lazy evaluation of just the properties that were needed by the script and track all changes so we can reserialize things more easily.

Moving to Jurassic had broken all of that, so we have to re-write it all. The good thing is that we already knew what we wanted, and how we wanted to do it, it was just a matter for us to figure out how Jurassic allows it. There was an epic coding montage (see image on the right) and we got it all back into working state.

Along the way, we paid down a lot of technical debt  around consistency and exposure of operations and made it easier all around to work with JS from inside RavenDB. hate javascript.

After a lot of time, we had everything stable enough so we could now test RavenDB with the new JS engine. The results were… abysmal. I mean, truly and horribly so.

But that wasn’t right, we specifically sought a fast JS engine, and we did everything right. We cached the generated values, we reused instances, we didn’t serialize / deserialize too much and we had done our homework. We had benchmarks that showed very clearly that Jurassic was the faster engine. Feeling very stupid, we started looking at the before and after profiling results and everything became clear and I hate javascript.

Jurassic is the faster engine, if most of your work is actually done inside the script. But most of the time, the whole point of the script is to do very little and direct the rest of the code in what it is meant to do. That is where we actually pay the most costs. And in Jurassic, this is really expensive. Also, I hate javascript.

It was really painful, but after considering this for a while, we decided to switch back to Jint. We also upgraded to the latest release of Jint and got some really nice features in the box. One of the things that I’m excited about is that it has a ES6 parser, even if it doesn’t fully support all the ES6 features. In particular, I really want to look implementing arrow functions for jint, because it would be very cool for our usecase, but this will be later, because I hate javascript.

Instead of reverting the whole of our work, we decided to take this as a practice run of the refactoring to isolate us from the JS engine. The answer is that it was much easier to switch from Jurassic to Jint then it was the other way around, but it is still not a fun process. There is too much that we depend on. But this is a really great lesson in understanding what we are paying for. We got a great deal of refactoring done, and I’m much happier about both our internal API and the exposed interfaces that we give to users. This is going to be much easier to explain now. Oh, and I hate javascript.

I had to deal with three different javascript engines (two versions of Jint and Jurassic) at a pretty deep level. For example, one of the things we expose in our scripts is the notion of null propagation. So you can write code like this:

return user.Address.City.Street.Number.StopAlready;

And even if you have missing properties along the way, the result will be null, and not an error, like normal. That requires quite deep integration with the engine in question and lead to some tough questions about the meaning of the universe and the role of keyboards in our life versus the utility of punching bugs (not a typo).

The actual code we ended up with for this feature is pretty small and quite lovely, but getting  there was a job and a half, causing me to hate javascript, and I wasn’t that fond of it in the first place.

I’ll have the performance numbers comparing all three editions sometimes next week. In the meantime, I’m going to do something that isn’t javascript.

Breaking the language barrier

time to read 5 min | 916 words

In RavenDB 4.0, we decided to finally bite the bullet and write our own query language. That led to a lot of really complex decisions that we had to make. I already posted about the language and you saw some first drafts. RQL is meant to be instantly familiar to anyone who used SQL before.

At the same time, this led to issues. In particular, the major issue is that SQL as a language is meant to handle rectangular tables, and RavenDB isn’t storing data in tables.  For that matter, we also want to give our users the ability to express themselves fully in the query language, that means support for complex queries and complex projections.

For a while, we explored the option of supporting nested selects as the way to express these semantics, but that was pretty horrible, both in terms of the complexity of the implementation and in terms of the complexity of the language. Instead, we decided that take only the good parts out of SQL Smile.

What do I mean by that? Well, here is a pretty simple query:

image

And here is how we can ask for specific fields:

image

You’ll note that we moved the select clause to the end of the query. The idea being that the syntax will hint to the user about the order of operations in the pipeline.

Next we add some filtering, giving us:

image

This isn’t really interesting except for the part where implementing this was a lot of fun. Things become both more interesting and more obvious when we look at a full query, with almost everything there:

image

And again, this is pretty boring, because except for the clauses location, you might as well write SQL. So let us see what happens when we start mixing some of RavenDB’s own operations.

Here is how we can traverse the object graph on the database side:

image

This is an interesting example, because you can see that we have traversed the path of the relation and are able to fetch not just the data from the documents we are looking for, but also the related documents.

It is important to note that this happens after the where, so you can’t filter on that (but you can plug this in the index, but that is a story for another time). I like this syntax, because it is very clear about what is going on, but at the same time, it is also very limited. If I want anything that isn’t rectangular, I need to jump through hops. Instead, let us try something else…

image

The idea is that we can use a JavaScript object literal in place of the select. Instead of fighting with the language and have to fight it, we have a very natural way to express projections from RavenDB.

The cool part is that you can apply logic as well, so things like string concatenation or logic in the select clause are absolutely permitted. However, take a look at the example, we have code duplication here, in the formatting for customer and contact names. We can do something about it, though:

image


The idea is that we also have a way to define functions to hold common logic and handle the more complex details if we need to. In this manner, instead of having to define and deploy transformers, you can define that directly in your query.

Naturally, if you want to calculate the Mandelbrot set inside the query, that is… suspect, but the idea is that having JavaScript applied on the results of the query give us a lot more freedom and easy of use.

The actual client side in C# is going to remain unchanged and was already updated to talk to the new backend, but in our dynamic language clients, I think we’ll work to expose this more, since it is a much better fit in such environments.

Finally, here is the full query, with everything that we can do:

image

Don’t focus too much the actual content of the queries, they are mostly there to show off the syntax. The last query now has the notion of include directly in the query. As an aside, you can now use load and include to handle tertiary includes. I didn’t actually set out to build them, but they came naturally from the syntax, and they are probably going to stay.

I have a lot more to say about this, but I’m so excited about this feature that I just got to get it out there. And yes, as you can see, you can declare several functions, and you can also call them from one another so there is a lot of power at the table right now.

Optimizing JavaScript and solving the halting problemPart II

time to read 3 min | 544 words

In the previous post I talked about the issues we had with wanting to run untrusted code and wanting to use Jurassic to do so. The problem is that when the IL code is generated, it is then JITed and run on the machine directly, we have no control over what it is doing. And that is a pretty scary thing to have. A simple mistake causing an infinite loop is going to take down a thread, which requires us to Abort it, which means that we are now in a funny state. I don’t like funny states in production systems.

So we were stuck, and then I realized that Jurrasic is actually generating IL for the scripts. If it generates the IL, maybe we can put something in the IL? Indeed we can, here is the relevant PR for Jurrasic code. Here is the most important piece of code here:

The key here is that we can provide Jurrasic with a delegate that will be called at specific locations in the code. Basically, we do that whenever you evaluate a loop condition or just before you jump using goto. That means that our code can be called, which means that we can then check whatever limits such as time / number of iterations / memory used have been exceeded and abort.

Now, the code is written in a pretty funny way. Instead of invoking the delegate we gave to the engine, we are actually embedding the invocation directly in each loop prolog. Why do we do that? Calling a delegate is something that has a fair amount of cost when you are doing that a lot. There is a bit of indirection and pointer chasing that you must do.

The code above, however, is actually statically embedding the call to our methods (providing the instance pointer if needed). If we are careful, we can take advantage of that. For example, if we mark our method with aggressive inlining, that will be taken into account. That means that the JavaScript code will turn into IL and then into machine code and our watchdog routine will be there as well. Here is an example of such a method:

We register the watchdog method with the engine. You’ll notice that the code smells funny. Why do we have a watchdog and then a stage 2 watchdog? The reason for that is that we want to inline the watchdog method, so we want it to be small. In the case above, here are the assembly instructions that will be added to a loop prolog.

These are 6 machine instructions that are going to have great branch prediction and in general be extremely lightweight. Once in a while we’ll do a more expensive check. For example, we can check the time or use GC.GetAllocatedBytesForCurrentThread() to see that we didn’t allocate too much. The stage 2 check is still expected to be fast, but by making it only rarely, it means that we can actually afford to do this in each and every place and we don’t have to worry about the performance of that.

This means that our road is opened to now move to Jurassic fully, since the major limiting factor, trusting the scripts, is now handled.

Optimizing JavaScript and solving the halting problemPart I

time to read 3 min | 537 words

RavenDB is a JSON document database, and the natural way to process such documents is with JavaScript. Indeed, there is quite a lot of usage of JS scripts inside RavenDB. They are used for ETL, Subscription filtering, patching, resolving conflicts, managing administration tasks and probably other stuff that I’m forgetting.

The key problem that we have here is that some of the places where we need to call out to JavaScript are called a lot. A good example would be a request to patch request on a query. The result can be a single document modified of 50 millions. If this is the later case, given our current performance profile, it turns out that the cost of evaluating JavaScript is killing our performance.

We are using Jint as the environment that runs our scripts. It works, it is easy to understand and debug and it is an interpreter. That means that it is more focused on correctness then performance. Over the years, we were able to extend it a bit to do all sort of evil things to our scripts ,but the most important thing is that it isn’t actually executing machine code directly, it is always Jint code running and handling everything.

Why is that important? Well, these scripts that we are talking about can be quite evil. I have seen anything from 300 KB of script (yes, that is 300 KB to modify a document that was considerably smaller) to just plain O(N^6) algorithms (document has 6 collections iterate on each of them while iterating on each of the others). These are just the complex ones. The evil ones do things like this:

We have extended Jint to count the number of operations are abort after a certain limit has passed as well as prevent stack overflow attacks. This means that it is much easier to deal with such things. They just gonna run for a while and then abort.

Of course , there is the perf issue of running an interpreter. We turned out eyes to Jurrasic, which took a very different approach, it generate IL on the fly and then execute it. That means that as far as we are concerned, most operations are going to end up running as fast as the rest of our code. Indeed, benchmarks show that Jurrasic is significantly faster (as in, order of magnitude or more). In our own scenario, we saw anything from simply doubling the speed to order of magnitude performance improvement.

However, that doesn’t help very much. The way Jurrasic works, code like the one above is going to hang or die. In fact, Jurassic documentation calls this out explicitly as an issue and recommend dedicating a thread or a thread pool for this and calling Thread.Abort if need. That is not acceptable for us. Unfortunately, trying to fix this in a smart way take us to the halting problem, and I think we already solved that mess.

This limiting issue was the reason why we kept using Jint for a long time. Today we finally had a breakthrough an were able to fix this issue. Here is the PR, but it tells only part of the story, the rest I’ll leave for tomorrow’s post.

Public Service AnnouncementConcurrentDictionary.Count is locking

time to read 2 min | 233 words

During a performance problem investigation, we discovered that the following innocent looking code was killing our performance.

This is part of a code that allow users to subscribe to changes in the database using a WebSocket. This is pretty rare, so we check that there aren’t any and skip all the work.

We had a bunch of code that run on many threads and ended up calling this method. Since there are not subscribers, this should be very cheap, but it wasn’t. The problem was that the call to Count was locking, and that created a convoy that killed our performance.

We did some trawling through our code base and through the framework code and came up with the following conclusions.

ConcurrentQueue:

  • Clear locks
  • CopyTo, GetEnumerator, ToArray, Count creates a snapshot (consume more memory)
  • TryPeek, IsEmpty are cheap

Here is a piece of problematic code, we are trying to keep the last ~25 items that we looked at:

The problem is that this kind of code ensures that there will be a lot of snapshots and increases memory utilization.

ConcurrentDictionary:

  • Count, Clear, IsEmpty, ToArray  - lock the entire thing
  • TryAdd, TryUpdate, TryRemove – lock the bucket for this entry
  • GetEnumerator does not lock
  • Keys, Values both lock the table and forces a temporary collection

If you need to iterate over the keys of a concurrent dictionary, there are two options:

Iterating over the entire dictionary is much better then iterating over just the keys.

Production postmortem30% boost with a single line change

time to read 7 min | 1231 words

johnny-automatic-lame-fox-300pxThis isn’t actually about a production system, per se. This is about a performance regression we noticed in a scenario for 4.0. We recently started doing structured performance testing as part of our continuous integration system.

There goes the sad trombone sound, because what that told us is that at some point, we had a major performance regression. To give you that in numbers, we used to be around 50,000 writes per second (transactional, fully ACID, hit the disk, data is safe) and we dropped to around 35,000 writes per second. Fluctuations in performance tests are always annoying, but that big a drop couldn’t be explained by just random load pattern causing some variance in the results.

This was consistent and major difference between the previous record. Now, to explain things, the last firm numbers we had about this performance test was from Dec 2016. Part of the release process for 4.0 calls for a structured and repeatable performance testing, and this is exactly the kind of thing that it is meant to catch. So I’m happy we caught it, not so happy that we had to go through about 7 months of changes to figure out what exactly drove our performance down.

The really annoying thing is that we put a lot of effort into improving performance, and we couldn’t figure out what went wrong there. Luckily, we could scan the history and track our performance over time. I’m extremely over simplifying here, of course. In practice this meant running bisect on all the changes in the past 7 – 8  months and running benchmarks on each point. And performance is usually not something as cut and dried as a pass / fail test.

Eventually we found the commit, and it was small, self contained and had a huge impact on our performance. The code in question?

ThreadPool.SetMinThreads(MinThreads, MinThreads);

This change was part of ironically, a performance run that focused on increasing the responsiveness of the server under a specific load. More specifically, it handled the case where under a harsh load we would run into thread starvation. The problem with this load is that this specific load can only ever happen in our test suite. At some point we had run up to 80 parallel tests, each of which might spawn multiple servers. The problem is that we run them all a single process, so we run into a lot of sharing between the different tests.

That was actually quite intentional, because it allowed us to see how the code behaves in a bad environment. It exposed a lot of subtle issues, but eventually it got very tiring to have tests fail because there just weren’t enough threads to run them fast enough. So we set this value to a high enough number to not have to worry about this.

The problem is that we set it in the server, not in the tests. And that led to the performance issue. But the question is why would this setting cause such a big drop? Well, the actual difference in performance went from 20 μs to taking 28.5 μs, which isn’t that much when you think about it. The reason why this happened is a bit complex, and require understanding several different aspect of the scenario under test.

In this particular performance scenario, we are testing what happens when a lot of clients are writing to the database all at once. Basically, this is a 100% small writes scenario. Out aim here is to get sustained throughput and latency throughout the scenario. Given that we ensure that every write to RavenDB actually hit the disk and is transactional, that means that every single request needs to physically hit the disk. If we tried doing that for each individual request, that would lead to about 100 requests a second on a good disk, if that.

In order to avoid the cost of having to go to the physical disk, we do a lot of batching. Basically, we read the request from the network, do some basic processing and then send it to a queue where it will hit the disk along with all the other concurrent requests. We make heavy use of async to allow us to consume as little system resources as possible. This means that at any given point, we have a lot of requests that are in flight, waiting for the disk to acknowledge the write. This is pretty much the perfect system for high throughput server, because we don’t have to hold up a thread.

So why would increasing the amount of minimum threads hurt us so much?

To answer that we need to understand what it is exactly that SetMinThreads control. The .NET thread pool is a lazy one, even if you told it that the minimum number of threads is 1000, it will do nothing with this value. Instead, it will just accept work and distribute it among its threads. So far, so good. The problem starts when we have more work at hand then threads to process it. At this point, the thread pool need to make a decision, it can either wait for one of the existing threads to become available of it can create a new thread to  handle the new load.

This is when SetMinThreads come into play. Up until the point the thread pool created enough threads to exceed the min count, whenever there is more work in the pool then there are threads, a new thread will be created. If the minimum number of threads has been reached, the thread pool will wait a bit before adding new threads. This allow the existing threads time to finish whatever they were doing and take work from the thread pool. The idea is that given a fairly constant workload, the thread pool will eventually adjust to the right amount of threads to ensure that there isn’t any work that is sitting in the queue for too long.

With the SetMinThreads change, however, we increased the minimum number of threads significantly, and that caused the thread pool to create a lot more threads then it would otherwise create. Because our workload is perfect for the kind of tasks that the thread pool is meant to handle (short, do something and schedule more work then yield) it is actually very rare that the thread pool would need to create an additional thread, since a thread will be available soon enough.

This is not an accident, we very careful set out to build it in this fashion, of course.

The problem is that with the high number of minimum threads, it was very easy to get the thread pool into a situation where it had more work then threads. Even though we would have a free thread in just a moment, the thread pool would create a new thread at this point (because we told it to do so).

That lead to a lot of threads being created, and that lead to a lot of context switches and thread contentions as the threads are fighting over processing the incoming requests. In effect, that would take a significant amount of the actual compute time that we have for processing the request. By removing the SetMinThreads calls, we reduced the number of threads, and were able to get much higher performance.

The cost of an extension point

time to read 3 min | 450 words

Rejohnny-automatic-running-home-300pxcently we had a discussion on server side extension points in RavenDB 4.0, currently, outside of the ability to load custom analyzers for Lucene, we have none. That is in stark contrast for the architecture we have in 3.x.

That is primarily because we found out several things about our extensions based architecture, over the years.

Firstly, the extension based approach allowed us to develop a lot of standard extensions (called bundles) in 3.x. In fact, replication, versioning, unique constraints and a lot of other cool features are all implemented as extensions. On the one hand, it is a very convenient manner to add additional functionality, but it is also led to what I can’t help but term an isolationist attitude. In other words, whenever we build a feature, we tended to focus on that particular feature on its own. And a lot of trouble we have seen has been the result of feature intersection. Versioning and replication in a multi master cluster, for example, can have a fairly tricky behavior.

And the only way to resolve such issues is to actually teach the different pieces about each other and how they should interact with one another. That sort of defeated the point of writing independent extensions. Another an issue is that the interface that we have to expose are pretty generic. In many cases, that means that we have to do extra work (such as materializing values) in order for us to send the right values to the extension points. Even if in most cases, that extra work isn’t used.

Another issue is that in RavenDB 4.0, we have been treating all the pieces as a single whole, instead of tacking them on after the fact. Versioning, for example, as a core feature, has implications for maintaining transaction boundaries in a distributed environment, and that requires tight integration with the replication code and other parts on the environment. It is much easier to do so when we are building it as a single coherent piece.

There is another aspect here, not having external extensions also make it much easier to lay down the law with regards to what can and cannot happen in our codebase. We can make assumptions about what things will do, and this can result in much better code and behavior.

Now, there are things that we are going to have to do (allow you to customize your indexes with additional code, for example) and we can allow that by allowing you to upload additional code to be compiled with your indexes, but that is a very narrow use case.

Overloading the Windows Kernel and locking a machine with RavenDB benchmarks

time to read 4 min | 781 words

During benchmarking RavenDB, we have run into several instances where the entire machine would freeze for a long duration, resulting in utter non responsiveness.

This has been quite frustrating to us, since a frozen machine make it kinda hard to figure out what is going on. But we finally figured it out, and all the details are right here in the screen shot.

image

What you can see is us running our current benchmark, importing the entire StackOverflow dataset into RavenDB. Drive C is the system drive, and drive D is the data drive that we are using to test our RavenDB’s performance.

Drive D is actually a throwaway SSD. That is, an SSD that we use purely for benchmarking and not for real work. Given the amount of workout we give the drive, we expect it to die eventually, so we don’t want to trust it with any other data.

At any rate, you can see that due to a different issue entirely, we are now seeing data syncs in excess of 8.5 GB. So basically, we wrote 8.55GB of data very quickly into a memory mapped file, and then called fsync. At the same time, we started increasing our  scratch buffer usage, because calling fsync ( 8.55 GB ) can take a while. Scratch buffers are a really interesting thing, they were born because of Linux crazy OOM design, and are basically a way for us to avoid paging. Instead of allocating memory on the heap like normal, which would then subject us to paging, we allocate a file on disk (mark it as temporary & delete on close) and then we mmap the file. That give us a way to guarantee that Linux will always have a space to page out any of our memory.

This also has the advantage of making it very clear how much scratch memory we are currently using, and on Azure / AWS machines, it is easier to place all of those scratch files on the fast temp local storage for better performance.

So we have a very large fsync going on, and a large amount of memory mapped files, and a lot of activity (that modify some of those files) and memory pressure.

That force the Kernel to evict some pages from memory to disk, to free something up. And under normal conditions, it would do just that. But here we run into a wrinkle, the memory we want to evict belongs to a memory mapped file, so the natural thing to do would be to write it back to its original file. This is actually what we expect the Kernel to do for us, and while for scratch files this is usually a waste, for the data file this is exactly the behavior that we want. But that is beside the point.

Look at the image above, we are supposed to be only using drive D, so why is C so busy? I’m not really sure, but I have a hypothesis.

Because we are currently running a very large fsync, I think that drive D is not currently process any additional write requests. The “write a page to disk” is something that has pretty strict runtime requirements, it can’t just wait for the I/O to return whenever that might be. Consider the fact that you can open a memory mapped file over a network drive, I think it very reasonable that the Kernel will have a timeout mechanism for this kind of I/O. When the pager sees that it can’t write to the original file fast enough, it shrugs and write those pages to the local page file instead.

This turns an  otherwise very bad situation (very long wait / crash) into a manageable situation. However, with the amount of work we put on the system, that effectively forces us to do heavy paging (in the orders of GBs) and that in turn lead us to a machine that appears to be locked up due to all the paging. So the fallback error handling is actually causing this issue by trying to recover, at least that is what I think.

When examining this, I wondered if this can be considered a DoS vulnerability, and after careful consideration, I don’t believe so. This issue involves using a lot of memory to cause enough paging to slow everything down, the fact that we are getting there in a somewhat novel way isn’t opening us to anything that wasn’t there already.

FUTURE POSTS

  1. The best features are the ones you never knew were there: You can’t do everything - about one day from now
  2. You are doing it REALLY wrong, the shortest code review ever - 3 days from now
  3. Carefully performing invalid operations to get the wrong error and the right result - 4 days from now
  4. If you have a finalizer, watch your ctor - 5 days from now
  5. Production postmortem: The random high CPU - 6 days from now

And 7 more posts are pending...

There are posts all the way to Dec 12, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. API Design (9):
    27 Jul 2016 - robust error handling and recovery
  3. Production postmortem (21):
    07 Aug 2017 - 30% boost with a single line change
  4. The best features are the ones you never knew were there (5):
    21 Nov 2017 - Unsecured SSL/TLS
  5. RavenDB Setup (2):
    23 Nov 2017 - How the automatic setup works
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats