Ayende @ Rahien

Refunds available at head office

Hunting the performance bottleneck

We got a support request from a customer, and this time it was quite a doozie. The were using the FreeDB dataset as a test bed for a lot of experiments, and they found very slow indexing speed with it.  In fact, what they found was utterly unacceptable indexing speed, to be frank. It took days to complete. However, that run contrary to all of the optimizations that we have done in the past few years.

So something there was quite fishy. Luckily, it was fairly simple to reproduce. And the culprit was very easily identified. It was the SQL Replication Bundle. But why? That turned out to be a pretty complex answer. 

The FreeDB dataset currently contains over 3.1 million records, and the sample the customer send us had about 8 indexes, varying in complexity from the trivial to full text analyzed and map reduce indexes. We expect such work load plus the replication to SQL to take a while. But it should be pretty fast process.

What turned out to be the problem was the way the SQL Replication bundle work. Since we don’t know what changes you are going to replicate to SQL, the first thing we do is to delete all the data that might have previously been replicated. In practice, it means that we execute something like:

DELETE FROM Disks WHERE DocumentId in (@p1, @p2, @p3)

INSERT INTO Disks (...) VALUES( ... )
INSERT INTO Disks (...) VALUES( ... )
INSERT INTO Disks (...) VALUES( ... )

And over time, this became slower & slower. Now, SQL Replication need access to a lot of documents (potentially all of them), so we use the same prefetcher technique that we use for indexing. And we also use the same optimizations to decide how much to load.

However, in this case, we had the SQL Replication being slow, and because we use the same optimization, to the optimizer it looked like we were having a very slow index. That calls for reducing the load on the server so we can have greater responsiveness and to reduce overall resource utilization. And that impacted the indexes. In effect, SQL Replication being slow forced us to feed the data into the indexes in small tidbits, and that drastically increased the I/O costs that we had to pay.

So the first thing to do was to actually break it apart, we now have different optimizers instances for indexes and SQL Replication (and RavenDB Replication, for that matter), and they cannot impact one another.

But the root cause was that SQL Replication was slow. And I think you should be able to figure out why from the information outline above.

As we replicated more and more data into SQL, we increased the table size. And as we increased the table size, statements like our DELETE would take more and more time. SQL was doing a lot of table scans.

To be honest, I never really thought about it. RavenDB in the same circumstances would just self optimize and thing would get faster fast. SQL Server (and any other relational database) would just be dog slow until you came and added the appropriate index.

Once that was done, our performance was back on track and we could run things speedily both for indexes and for SQL Replication.

Comments

Simon
10/18/2013 09:09 AM by
Simon

Would it be possible to 'self-optimize' SQL Server and have the replication bundle automatically create the relevant indexes?? Realize this might be outside the scope of RavenDB, but could possibly be a configuration option that defaults to false.

Catalin Pop
10/18/2013 11:24 AM by
Catalin Pop

So RDBMSes are slow by default, thanks for reminding us :)

Khalid Abuhakmeh
10/18/2013 12:01 PM by
Khalid Abuhakmeh

Naive question, but why not use the same identifier that is in RavenDB as the Id in SQL Server?

In map reduce indexes, you could allow the user to specify a unique key, like the key used to reduce on.

Additionally, knowing you now have an Identifier, you can create a clustered index on the Id by default.

njy
10/18/2013 08:06 PM by
njy

@Oren: naive question, mee to. It is true if I say that you can always recreate an entire SQL replica from scratch if, just as a stupid example, i delete the entire SQL db to restart from scratch? I'm asking because if that is so, there may be a good way to speed things up even more.

Afif Mohammed
10/19/2013 04:02 AM by
Afif Mohammed

If I am to pitch RavenDB as the primary store for Enterprise scale mission critical apps, I am confident I"ll have happy end users, but internal users like admins, ops, finance e.t.c want to run crazy adhoc queries and tons and tons of reports day in and out, which I have to concede a NOSQL store is not optimized and ideal for. Hence the above pitch can only succeed when you introduce an RDBMS for reporting, that you replicate into (albeit with some 'acceptable' lag). This is an acceptable compromise and the most happy path (at least to start with) for most Enterprises, that wish to go bleeding edge. Given the SQL replication bundle is not used as a first class citizen when the database is tested with millions of records over many days before a stable build is put out, and given its not given the same thorough introspection as other database features like indexing, replication e.t.c, can I ask does the RavenDB team believe they are building a product that is most suitable as the primary database for Enterprise scale mission critical apps?

Afif Mohammed
10/19/2013 04:28 AM by
Afif Mohammed

My ideal scenario would be, if RavenDB out of the box, without writing a single line of code or messy configuration, will simply use intelligent conventions to map json documents to tables and start spitting them out to an RDBMS. It may not have primary keys, it may not have any referential integrity, but the data replicates to an acceptable RDBMS schema 'out of the box'. Call it an extra product, charge an extra license fee for it, but make it work. IMHO this feature alone will take you farthest in breaking the ice with Enterprise systems. Surely it shouldn't be that hard. If not can you please highlight what are the major challenges involved here.

Rafal
10/19/2013 02:03 PM by
Rafal

Afif, why use a NoSQL database if the only way to make it useful to company is to copy all the data to SQL? This makes quite good ammo for all corporate db admins to shoot at you.

Afif Mohammed
10/19/2013 11:26 PM by
Afif Mohammed

Rafal, I would use NoSQL because it makes most sense for the application it is consumed in. It makes most sense when delivering value to end users. It makes most sense for developer productivity. Internal users i.e Business users, have more reporting requirements. With any long standing enterprise business, there are often many internally who are used to running queries over some SQL instance they have been given as a replica. This has atleast been my experience. They are two different problems, one of modelling the business domain, other of reporting and analysis over data.

Jon
10/19/2013 11:45 PM by
Jon

I agree with Afif in that that having reliable SQL replication will push this product into more mainstream use. I had a scenario a few months ago where SQL Replication was the cause of the organisation choosing to ditch Raven and go with SQL Server (causing a major re-architect of the system and costing a lot of money). It came down to the fact that the decision makers don't like their reports being wrong, they lost faith in the data and that spelled the end for Raven at that organisation (other factors were indexing time and it not playing well with DTC). Out of the box SQL Replication that 'just works' would be a god send.

Ayende Rahien
10/20/2013 07:42 AM by
Ayende Rahien

Simon & Khalid & njy,

We aren't actually in control of the DB schema. That is pretty much the point of the SQL Replication Bundle. You can replicate the data from RavenDB to your DB of choice without limitation. Since we dont' control the schema, we can't generate the right indexes. Moreover, even verifying that those indexes exists is a problem, since we support multiple databases. Finally, there is a real issue with trying to automatically create indexes in RDBMS. That is very slow and can easily create table or database locks. In fact, unless you are doing things very carefully, it is going to generate a table lock. And the only way to avoid that is usually to have the Enterprise version and use magic incantation.

We cannot use the same identifier in SQL & in RavenDB, because we don't control the schema. We just require that you'll have a field for the document id. Note that this doesn't have to be unique. Indeed, this cannot be unique. Think about the scenario in which we replicate orders information. The document id is going to be the same for all of the lines.

Another issue is map/reduce. We aren't doing replication on that. We are replicating documents, not indexes.

Recreating the entire DB from scratch isn't actually a scenario that we support. We always assume that we start from the middle. The reason for that is that we usually need to talk to existing systems. Beyond that, this problem would still be there for updates, and it is usually better to be able to spot this behavior early on during bulk insert rather than have it creep on you slowly in production.

Afif,

What gave you the impression that we aren't testing the SQL Replication Bundle as well as anything else? This is a core feature, distributed as part of the Raven.Database.dll The tests that we have for it included the indexes on the schema as a matter of course. It didn't occur to us not to do that, and we assumed that the customer would have done the same. It was only when we saw the problem that we realized that we didn't properly document that requirement.

What you request isn't going to work. You are going to need at least indexes on the tables. But more importantly, there are a LOT of JSON features that are going to be very hard to translate to RDBMS schames.

Tags: ["RavenDB", "NoSQL"]

Is a common example. Is it going to a separate table? Is is going to be a comma separate string? Is it going to be an association table?

Too many choices to actually do something like that automatically. That is why we have the SQL Replication Bundle that allows you to write a script that will do the translation. Beyond that, it is all managed for you.

Rafal,

Standard recommendation for ages, even for RDBMS systems, has been to have an OLTP and a OLAP databases. One database for handle transactions, one database for reporting.

Jon, This is the first that I am hearing of that, and it doesn't sounds right. SQL Replication is pretty robust, and while there might be a lag time, that is usually not a problem for reports.

Simon
10/20/2013 12:45 PM by
Simon

Thanks for the response Ayende. Makes sense that you can't control the DB schema.

Comments have been closed on this topic.