Getting deeper into our indexing optimization routines, when we last left it, we had the following system:
This was good because it was able to predictively decide when to increase the batch size and smooth over spikes easily. But note where we have the costs?
The next step was this:
Pre fetching, basically. What we noticed is that we were spending a lot of time just loading the data from the disk, and we changed our behavior to allow us to load things while we are indexing. So on the next indexing batch, we will usually find all of the data we needed already in memory and ready to rock.
This gave us a pretty big boost in how fast we can index things , but we aren’t done yet. In order to make this feature viable, we had to do a lot of work there. For starter, we had to make sure we would take too much memory, and we wouldn’t impact other aspects of the database, etc. Interesting work, all around, even if I am just focusing on the high level optimizations. There is still a fairly obvious optimization waiting for us, but I’ll discuss that in the next post.