﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Andrew commented on An interesting RavenDB bug</title><description>Have you considered the formula often used when re-allocating memory (under c++?
  
  
You start with say 10 MB, and if you're doing something and you find its not enough you double it - 20 MB, if you hit this wall again you double it to 40 MB, ... etc.
  
  
The theory being that you do the least amount of reallocations (by doubling the amount used each time - if a task does require a large amount of memory it should be reached very quickly), and the worst case is that you just use a bit too much memory for an operation.
  
  
In your case, worst case is you'd fetch too many documents (oh well) but you can reduce the amount of fetches you do by just requesting double the amount of documents since the last call.
  
  
I hope that makes sense :) 
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment8</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment8</guid><pubDate>Sat, 24 Jul 2010 03:44:12 GMT</pubDate></item><item><title>Ayende Rahien commented on An interesting RavenDB bug</title><description>Ajai,
  
Because I am indexing 10 results per each document, so you can't do the paging inside Lucene.
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment7</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment7</guid><pubDate>Fri, 23 Jul 2010 06:21:53 GMT</pubDate></item><item><title>Ajai Shankar commented on An interesting RavenDB bug</title><description>I do not understand why the index would return 10 documents? Why is this a cartesian, is it a lucene thing?
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment6</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment6</guid><pubDate>Thu, 22 Jul 2010 23:08:33 GMT</pubDate></item><item><title>configurator commented on An interesting RavenDB bug</title><description>Ayende, The thing is if I read this code correctly, the skippedDocs is the number of extra copies we got.
  
  
Suppose the page size is 10 and we got: 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9, 10. If there is usually a similar number of copies, another 10 results should be more than enough - bet skippedDocs is 2 so we'd get 20 results.
  
Now what if we got 1,1,2,2,3,3,4,4,5,5 ? Another 10 results should be exactly enough, but we're getting 50.
  
  
Seems to me like the heuristically-accurate function would be:
  
var wasteMultiplier = (returnedResults + skippedDocs) / (float)returnedResults;
  
pageSize = wasteMultiplier * (indexQuery.pageSize - returnedResults);
  
  
Of course, this is too heuristically-accurate - you want to favour getting a bit more results than needed over having extra queries. Seems to me like a better heuristic would be 
  
  
var wasteMultiplier = (returnedResults + skippedDocs) / (float)returnedResults;
  
pageSize = (int)(wasteMultiplier * (indexQuery.pageSize - returnedResults / X));
  
where X is some (float) constant that should be tweaked to get an optimal result. Probably somewhere between 1 and 2.
  
  
If using this heuristic, skippedDocs needs to be the total so shouldn't be reset to 0 each time. Add a boolean to know if a skip happened in the last loop iteration.
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment5</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment5</guid><pubDate>Thu, 22 Jul 2010 15:02:15 GMT</pubDate></item><item><title>Ayende Rahien commented on An interesting RavenDB bug</title><description>configurator,
  
The guess is that most documents would have similar number of results per doc.
  
Since we already know that the current read contained duplicate, we want to read the next range. And we want to limit the number of index calls that we make.
  
If you can think of a more accurate formula, I would love to see it.
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment4</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment4</guid><pubDate>Thu, 22 Jul 2010 14:50:33 GMT</pubDate></item><item><title>Ayende Rahien commented on An interesting RavenDB bug</title><description>Thilak,
  
That is pretty much what would have to happen, you _are_ reading a lot more data
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment3</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment3</guid><pubDate>Thu, 22 Jul 2010 14:48:29 GMT</pubDate></item><item><title>configurator commented on An interesting RavenDB bug</title><description>                // trying to guesstimate how many results we will need to read from the index
  
                // to get enough unique documents to match the page size
  
                pageSize = skippedDocs * indexQuery.PageSize; 
  
I don't see how that guesstimate is supposed to be close to the actual value needed. Seems to me like you're querying for way too much data once you've skipped anything.
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment2</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment2</guid><pubDate>Thu, 22 Jul 2010 09:46:40 GMT</pubDate></item><item><title>Thilak Nathen commented on An interesting RavenDB bug</title><description>Had the exact same cartesian product type problem recently with a lucene search. My solution was almost exactly what you got - launch a full blown brute force attack on the index. 
  
  
Did you profile memory usage after introducing this? Mine almost doubled.
</description><link>http://ayende.com/4558/an-interesting-ravendb-bug#comment1</link><guid>http://ayende.com/4558/an-interesting-ravendb-bug#comment1</guid><pubDate>Thu, 22 Jul 2010 09:37:08 GMT</pubDate></item></channel></rss>