Optimizing select projectionsPart II
In the previous post, I showed how we can handle select projections and setup a perf test. The initial implementation we had run in about 4.4 seconds for our test. But it didn’t give any thought to performance.
Let us see if we can do better. The first thing to do is to avoid building the Jint engine and parsing the code all the time. The way we set things up, we wrap the actual object literal in a function, and there is no state, so we can reuse the previous engine instance without any issues. ;:
That means that we don’t need to pay the cost of creating a new engine, parsing the code, etc. Here is what this looks like:
Note that the cache key here is the raw projection, not the function we send to the entire, this allows us to avoid any string allocations in the common (cached) path.
And this runs in 0.75 seconds on the 10K, 5K, 1K run.
- 10K in 574 ms
- 5K in 137 ms
- 1K in 51 ms
Just this small change boosted our performance by a major margin.
Note that because the engines are not thread safe, to use that in a real application we’ll need to ensure thread safety. The way I did that is to have a pool of these engines for each projection and just use that, so an engine is always access in a single threaded mode.
More posts in "Optimizing select projections" series:
- (01 Sep 2017) Part IV–Understand, don’t do
- (31 Aug 2017) Interlude, refactoring
- (30 Aug 2017) Part III
- (29 Aug 2017) Part II
- (28 Aug 2017) Part I
Comments
@Oren, I don't know if this code is a simplified version of the real one, but shouldn't it use some sort of LRU/MRU (or a time-based) logic to discard old entries in the dictionary? Otherwise it will grow ad infinitum and saturate memory
njy, Yes, is is simplified, we do a more complex discard in the real code
is the pool a simple round-robin, or does each engine keep a state of its run state etc?
Peter, The actual implementation is a bit complex, because is has multi level. But basically, we have a pool per db of the last 2048 (configurable) scripts that we run. Each of those scripts maintain a pool of engines that have been already started with it. The idea is that it is very common for a script to run in parallel, and the engines are single threaded, so we keep a queue of them and reuse them per script.
We track usages in the db pool, and when the db pool is too big, we discard a quarter of it and half the usage count of all existing instances. The idea is that this will age instances and will cause previously popular scripts to age out if they aren't actively used.
Comment preview