Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

Table scans, index scans and index seeks, on my!

time to read 5 min | 979 words

In general, when you break it down to the fundamentals, a data base is just a fancy linked list + some btrees. Yes, I am ignoring a lot, but if you push aside a lot of the abstraction, that is what is actually going on.

If you ever dealt with database optimizations you are familiar with query plans, like the following (from NHProf):

You can see that we have some interesting stuff going on here:

  • Index Seek
  • Index Scan

And if you are unlucky, you are probably familiar with the dreaded “your !@&!@ query is causing a table scan!” scream from the DBA. But most of the time, people just know that table scan is slow, index scan is fast and index seek is fastest. I am ignoring things like clustered vs. unclustered indexes, since they aren’t really important for what I want to do.

For simplicity sake, I’ll use the following in memory data structure:

public class DocumentDatabase
    public List<JObject> Documents = new ...;
    public IDictionary<string, IDictionary<JToken, JObject>> Indexes = new ...; 

To keep things simple, we will only bother with the case of exact matching. For example, I might store the following document:

{ "User": "ayende", "Name": "Ayende Rahien", "Email": "Ayende@ayende.com" }

And define an index on Name & Email. What would happen if I wanted to make a query by the user name?

Well, we don’t really have any other option, we have to do what amounts to a full table scan:

foreach (var doc in Documents)
    if(doc.User == "ayende")
          yield return doc;

As you can imagine, this is an O(N) operation, which can get pretty expensive if we are querying a large table.

What happen if I want to find data by name & email? We have an index that is perfectly suited for that, so we might as well use it:

Indexes["NameAndEmail"][new{Name="Ayende Rahien", Email = “Ayende@ayende.com”}];

Note that what we are doing here is accessing the NameAndEmail index, and then making a query on that. This is essentially an index seek.

What happens if I want to query just by email? There isn’t an index just for emails, but we do have an index that contains emails. We have two options, use a table scan, or and index scan. We already saw what a table scan is, so let us look at what is an index scan:

var nameAndEmailIndex = Indexes["NameAndEmail"];
foreach (var indexed in nameAndEmailIndex)
   if(indexed.Email == "ayende@ayende.com")
             yield return indexed;

All in all, it is very similar to the table scan, and when using in memory data structures, it is probably not worth doing index scans (at least, not if the index is over the entire data set).

Where index scans prove to be very useful is when we are talking about persistent data sets, where reading the data from the index may be more efficient than reading it from the table. That is usually because the index is much smaller than the actual table. In certain databases, the format of the data on the disk may make it as efficient to do a table scan in some situations as it to do an index scan.

Another thing to note is that while I am using hashes to demonstrate the principal, in practice, most persistent data stores are going to use some variant of trees.



You seem thinking only about finding one entity, but I'll tell you what else the index scans are useful for: to be able to enumerate entities in a pre-sorted way. Either to extract a pre-sorted sublist of the entire list, or to retrieve the entire list presorted. This could be useful as is (because for example the query has an ORDER BY) or because another part of the query could benefit from presorting (e.g. a merge join).

Ayende Rahien


This is pretty much because I am focusing on just internal details on what this actually matter.

I am using index scans quite often, actually. Good point

Simon Labrecque

Did you evaluate keeping Lucene as the index engine with your new Transactional Storage? If so, maybe you could write a post about that, explaining why (it seems) you chose to write your own indexing engine on top of your storage instead of just optimising the storage.

Ayende Rahien


Indexing is a bit more complex than a binary choice between Lucene or not.

For example, when implementing the managed storage for Raven, I need to store documents in the disk, and then I need to read them back.

How do you find document "users/ayende" ? That is what I refer to as Index Seek.

How do you find all the documents that were changed after a certain point in time? That is an Index Scan.

Those aren't tasks that I can shell to Lucene.

Simon Labrecque


"How do you find all the documents that were changed after a certain point in time"

Just thinking out loud, but if your Storage had that knowledge (ie, LastUpdated) and Lucene was hooked deeply to it, then Lucene would know about it, wouldn't it?

Just to make it clear, I find it VERY interesting that you are implementing your own indexing engine, I'm just wondering what could be done with Lucene in this particular case.

Ayende Rahien


Theoretically, yes, you could.

The problem is that I am updating Lucene in the background, and that are some questions that I have to have a transactional answer to

The last update one, for example, is important for replication


Where can I find the query plans window in NHProf? Can't find it.

Ayende Rahien


It is right next to show query results, in the bottom of the screen.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 2 days from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats