An interesting RavenDB bug

Jul 22 2010

An interesting RavenDB bug

time to read 4 min | 777 words

I got a very strange bug report recently,

The following index:

from movie in docs.Movies
from actor in movie.Actors
select new { Actor = actor }

Will produce multiple results from a single document, which poses a pretty big problem when you try to page through that. Imagine that each movie has 10 actors, and you are trying to page through this index for the first two documents of movies by Charlie Chaplin. The first movie that matches Charlie Chaplin will have ten results returned from the index, and simple paging at the index level will give us the wrong results.

Here is my solution for that, which works, but make me just a tad uneasy:
public IEnumerable<IndexQueryResult> Query(IndexQuery indexQuery)
{
    IndexSearcher indexSearcher;
    using (searcher.Use(out indexSearcher))
    {
        var previousDocuments = new HashSet<string>();
        var luceneQuery = GetLuceneQuery(indexQuery);
        var start = indexQuery.Start;
        var pageSize = indexQuery.PageSize;
        var skippedDocs = 0;
        var returnedResults = 0;
        do
        {
            if(skippedDocs > 0)
            {
                start = start + pageSize;
                // trying to guesstimate how many results we will need to read from the index
                // to get enough unique documents to match the page size
                pageSize = skippedDocs * indexQuery.PageSize; 
                skippedDocs = 0;
            }
            var search = ExecuteQuery(indexSearcher, luceneQuery, start, pageSize, indexQuery.SortedFields);
            indexQuery.TotalSize.Value = search.totalHits;
            for (var i = start; i < search.totalHits && (i - start) < pageSize; i++)
            {
                var document = indexSearcher.Doc(search.scoreDocs[i].doc);
                if (IsDuplicateDocument(document, indexQuery.FieldsToFetch, previousDocuments))
                {
                    skippedDocs++;
                    continue;
                }
                returnedResults++;
                yield return RetrieveDocument(document, indexQuery.FieldsToFetch);
            }
        } while (skippedDocs > 0 && returnedResults < indexQuery.PageSize);
    }
}

0 comments

Tags:

Raven

Oren Eini

Oren Eini

CEO of RavenDB

An interesting RavenDB bug

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication