Ayende @ Rahien

filter by tags archive

architecture (623) rss
bugs (451) rss
community (382) rss
databases (481) rss
design (899) rss
development (654) rss
hibernating-practices (73) rss
miscellaneous (592) rss
performance (397) rss
programming (1104) rss
raven (1471) rss
ravendb.net (558) rss
reviews (184) rss

2025
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

AI Agents, powered by your database - Bring AI into your applications with context, tools, and your data

Jan 28 2011

RavenDB on DevExpress TV

time to read 1 min | 50 words

Tweet Share Share 4 comments

Tags:

Raven

Here is another video about RavenDB, this time it is me and Seth Juarez discussing RavenDB on DevExpress TV.

As an aside, it is also me using Seth for user study of the API Smile

Jan 27 2011

The best bug reports are pull requests

time to read 1 min | 130 words

Tweet Share Share 2 comments

Tags:

Bugs

I just got the following bug report, I’ll just let you read it, and I have additional commentary below:

The great fun with getting a pull request with a failing test is that the whole process of working with this is pretty seamless.

For the one above, GitHub told me that I need to run the following command:

git pull https://github.com/benjamingram/ravendb.git DynamicFieldsBug

I did, and got the failing test, from there it was just a matter of fixing the actual bug, which was rather simple, and nothing that even smelled like ceremony.

Jan 26 2011

No, Alanis, THIS is ironic

time to read 1 min | 44 words

Tweet Share Share 17 comments

This is what I got when I was trying to figure out an OutOfMemoryException in RavenDB

Jan 23 2011

Performance optimizationsRinse, Repeat, Repeat, Repeat

time to read 7 min | 1306 words

Tweet Share Share 1 comments

Tags:

Raven

Originally posted at 1/18/2011

We got some reports that there was an O(N) issue with loading large number of documents from Raven.

I wrote the following test code:

var db = new DocumentDatabase(new RavenConfiguration
{
    DataDirectory = "Data"
});
Console.WriteLine("Ready");
Console.ReadLine();
while (true)
{
    var sp = Stopwatch.StartNew();

    db.Query("Raven/DocumentsByEntityName", new IndexQuery
    {
        Query = "Tag:Item",
        PageSize = 1024
    });

    Console.WriteLine(sp.ElapsedMilliseconds);
}

With 1,024 documents in the database, I could clearly see that most requests took in the order of 300 ms. Not content with that speed, I decided to dig deeper, and pulled out my trusty profiler (dotTrace, and no, I am not getting paid for this) and got this:

As should be clear, it seems like de-serializing the data from byte[] to a JObject instance is taking a lot of time (relatively speaking).

To be more precise, it takes 0.4 ms, to do two deserialization operations (for the following document):

{
    "cartId": 666,
    "otherStuff": "moohahaha",
    "itemList": [{
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    },
    {
        "productId": 42,
        "name": "brimstone",
        "price": 6.66,
        "quantity": 2000
    }]
}

I found it quite surprising, to tell you the truth. I wrote the following test case to prove this:

static void Main()
{
    
    var st =
        "IQAAAAJSYXZlbi1FbnRpdHktTmFtZQAFAAAASXRlbQAAvgIAABJjYXJ0SWQAmgIAAAAAAAACb3RoZXJTdHVm"+
        "ZgAKAAAAbW9vaGFoYWhhAARpdGVtTGlzdACFAgAAAzAATQAAABJwcm9kdWN0SWQAKgAAAAAAAAACbmFtZQAKA"+
        "AAAYnJpbXN0b25lAAFwcmljZQCkcD0K16MaQBJxdWFudGl0eQDQBwAAAAAAAAADMQBNAAAAEnByb2R1Y3RJZAA"+
        "qAAAAAAAAAAJuYW1lAAoAAABicmltc3RvbmUAAXByaWNlAKRwPQrXoxpAEnF1YW50aXR5ANAHAAAAAAAAAAMy"+
        "AE0AAAAScHJvZHVjdElkACoAAAAAAAAAAm5hbWUACgAAAGJyaW1zdG9uZQABcHJpY2UApHA9CtejGkAScXVhbn"+
        "RpdHkA0AcAAAAAAAAAAzMATQAAABJwcm9kdWN0SWQAKgAAAAAAAAACbmFtZQAKAAAAYnJpbXN0b25lAAFwcmljZ"+
        "QCkcD0K16MaQBJxdWFudGl0eQDQBwAAAAAAAAADNABNAAAAEnByb2R1Y3RJZAAqAAAAAAAAAAJuYW1lAAoAAABi"+
        "cmltc3RvbmUAAXByaWNlAKRwPQrXoxpAEnF1YW50aXR5ANAHAAAAAAAAAAM1AE0AAAAScHJvZHVjdElkACoAAAAAA"+
        "AAAAm5hbWUACgAAAGJyaW1zdG9uZQABcHJpY2UApHA9CtejGkAScXVhbnRpdHkA0AcAAAAAAAAAAzYATQAAABJwcm9"+
        "kdWN0SWQAKgAAAAAAAAACbmFtZQAKAAAAYnJpbXN0b25lAAFwcmljZQCkcD0K16MaQBJxdWFudGl0eQDQBwAAAAAAAA"+
        "ADNwBNAAAAEnByb2R1Y3RJZAAqAAAAAAAAAAJuYW1lAAoAAABicmltc3RvbmUAAXByaWNlAKRwPQrXoxpAEnF1YW50a"+
        "XR5ANAHAAAAAAAAAAAA";
    var buffer = Convert.FromBase64String(st);

    while (true)
    {
        var sp = Stopwatch.StartNew();
        for (int i = 0; i < 1024; i++)
        {
            DoWork(buffer);
        }
        Console.WriteLine(sp.ElapsedMilliseconds);
    }
}

private static void DoWork(byte[] buffer)
{
    var ms = new MemoryStream(buffer);
    JObject.Load(new BsonReader(ms));
    JObject.Load(new BsonReader(ms));
}

On my machine, this run at around 70ms for each 1,204 run. In other words, it takes significantly less I would have thought. Roughly 0.06 ms.

Note: The first number (0.4 ms) is under the profiler while the second number (0.06ms) is outside the profiler. You can routinely see order of magnitude differences between running inside and outside the profiler!

So far, so good, but we can literally see that this is adding almost a 100 ms to the request processing. That is good, because it is fairly simple to fix.

What I did was introduce a cache inside the serialization pipeline that made the entire cost go away. Indeed, running the same code above showed much better performance, an average of 200 ms.

The next step is to figure out where is the next cost factor… For that, we use the profiler.

And… we can see the the cost of de-serializing went drastically down. Now the actual cost is just doing a search for the document by the key.

You might have noticed that those results are for Munin. I have run the same test results with Esent, with remarkable similarity in the overall performance.

Jan 22 2011

Code Review tools

time to read 1 min | 70 words

Tweet Share Share 5 comments

Tags:

Originally posted at 1/17/2011

Occasionally I get asked what code review tools I use.

The truth is, I tend to use TortoiseGit’s Log Viewer and just read the history, but I am not leaning toward this baby:

This is really nice!

Jan 21 2011

RavenMQ update

time to read 2 min | 372 words

Tweet Share Share 9 comments

It wasn’t planned so much as it happened, but RavenMQ just slipped into private beta stage. The API is still in a the somewhat clumsy state, but it it is working quite nicely Smile

You can see an example of the client API below:

using(var connection = RavenMQConnection.Connect("http://reduction:8181"))
{
    connection.Subscribe("/queues/abc", (context, message) => 
        Console.WriteLine(Encoding.UTF8.GetString(message.Data)));

    connection.PublishAsync(new IncomingMessage
    {
        Queue = "/queues/abc",
        Data = Encoding.UTF8.GetBytes("Hello Ravens")
    });

    Console.ReadLine();
}

Please note that this is likely to be subject to many changes.

This is written about 10 minutes after I posted the code above:

using(var connection = RavenMQConnection.Connect("http://localhost:8181"))
{
    connection.
        Subscribe<User>("/queues/abc", (context, message) => Console.WriteLine(message.Name));

    connection
        .StartPublishing
        .Add("/queues/abc", new User {Name = "Ayende"})
        .PublishAsync();

    Console.ReadLine();
}

I told you it would change… Open-mouthed smile

Jan 20 2011

Recording of my RavenDB talk for E-VAN

time to read 1 min | 12 words

Tweet Share Share 2 comments

Originally posted at 1/13/2011

You can find the recording here:

http://vimeo.com/18708232

Jan 19 2011

It is the little things that trips you: Reducing RavenDB’s Map/Reduce cost

time to read 7 min | 1297 words

Tweet Share Share 3 comments

Tags:

Raven

After finally managing to get to Inbox Zero, I had the chance to tackle some problems that were on the back burner. One of the more acute ones was RavenDB performance for map/reduce indexes.

Standard indexes are actually very fast, especially since I just gave them an additional boost, but map/reduce indexes had two problems:

The reduce operation is currently single threaded (for all indexes).
The reduce operation blocks standard indexing.

In order to avoid that, we split the work so reduce would run in a separate thread from standard indexing operations. That done, I started to take a look at the actual cost of map/reduce operations.

It quickly became apparent that while the map part was pretty fast, it was the reduce operation that was killing us.

After some narrowing down, I was able to figure out that this is the code at fault:

public override void Execute(WorkContext context)
{
        if (ReduceKeys.Length == 0)
            return;

  var viewGenerator = context.IndexDefinitionStorage.GetViewGenerator(Index);
  if (viewGenerator == null)
    return; // deleted view?

  context.TransactionaStorage.Batch(actions =>
  {
    IEnumerable<object> mappedResults = null;
    foreach (var reduceKey in ReduceKeys)
    {
      IEnumerable<object> enumerable = actions.MappedResults.GetMappedResults(Index, reduceKey, MapReduceIndex.ComputeHash(Index, reduceKey))
        .Select(JsonToExpando.Convert);

      if (mappedResults == null)
        mappedResults = enumerable;
      else
        mappedResults = mappedResults.Concat(enumerable);
    }

    context.IndexStorage.Reduce(Index, viewGenerator, mappedResults, context, actions, ReduceKeys);
  });
}

Can you see the problem?

My first thought was that we had a problem with the code inside the foreach, since it effectively generate something like:

select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Oren"
select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Ayende"
select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Arava"

And usually you’ll have about 2,500 of those.

Indeed, I modified the code to look like this:

public override void Execute(WorkContext context)
{
        if (ReduceKeys.Length == 0)
            return;

  var viewGenerator = context.IndexDefinitionStorage.GetViewGenerator(Index);
  if (viewGenerator == null)
    return; // deleted view?
  
  context.TransactionaStorage.Batch(actions =>
  {
    IEnumerable<object> mappedResults = new object[0];
    foreach (var reduceKey in ReduceKeys)
    {
      IEnumerable<object> enumerable = actions.MappedResults.GetMappedResults(Index, reduceKey, MapReduceIndex.ComputeHash(Index, reduceKey))
        .Select(JsonToExpando.Convert);

      mappedResults = mappedResults.Concat(enumerable);
    }
    var sp = Stopwatch.StartNew();
    Console.WriteLine("Starting to read {0} reduce keys", ReduceKeys.Length);

    var results = mappedResults.ToArray();

    Console.WriteLine("Read {0} reduce keys in {1} with {2} results", ReduceKeys.Length, sp.Elapsed, results.Length);

    context.IndexStorage.Reduce(Index, viewGenerator, results, context, actions, ReduceKeys);
  });
}

And got the following:

Starting to read 2470 reduce keys
Read 2470 reduce keys in 00:57:57.5292856 with 2499 results

Yes, for 2,470 results, that took almost an hour!!

I started planning how to fix this by moving to what is effectively an “IN” approach, when I realized that I skipped a very important step, I didn’t run this through the profiler. And as we know, if it haven’t run through the profiler, it isn’t real, when we are talking about performance testing.

And the profiler led me to this method:

public IEnumerable<JObject> GetMappedResults(string view, string reduceKey, byte[] viewAndReduceKeyHashed)
{
    Api.JetSetCurrentIndex(session, MappedResults, "by_reduce_key_and_view_hashed");
    Api.MakeKey(session, MappedResults, viewAndReduceKeyHashed, MakeKeyGrbit.NewKey);
    if (Api.TrySeek(session, MappedResults, SeekGrbit.SeekEQ) == false)
        yield break;


    Api.MakeKey(session, MappedResults, viewAndReduceKeyHashed, MakeKeyGrbit.NewKey);
    Api.JetSetIndexRange(session, MappedResults, SetIndexRangeGrbit.RangeUpperLimit | SetIndexRangeGrbit.RangeInclusive);
    if (Api.TryMoveFirst(session, MappedResults) == false)
        yield break;
    do
    {
        // we need to check that we don't have hash collisions
        var currentReduceKey = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["reduce_key"]);
        if (currentReduceKey != reduceKey)
            continue;
        var currentView = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["view"]);
        if (currentView != view)
            continue;
        yield return Api.RetrieveColumn(session, MappedResults, tableColumnsCache.MappedResultsColumns["data"]).ToJObject();
    } while (Api.TryMoveNext(session, MappedResults));
}

You might have noticed the part that stand out Smile , yes, we actually had a O(n) algorithm here (effectively a table scan) that was absolutely meaningless and didn’t need to be there. I am pretty sure that it was test code that I wrote (yes, I run git blame on that and ordered some crow for tomorrow’s lunch).

Once I removed that, things changed for the better Open-mouthed smile . To give you an idea how much, take a look at the new results:

Starting to read 2470 reduce keys
Read 2470 reduce keys in 00:00:00.4907501 with 2499 results

And just for fun, I had tested how long it takes to reduce each batch of results:

Indexed 2470 reduce keys in 00:00:01.0533390 with 2499 results

And that is without doing any actual optimizations, that is just removing the brain-dead code that had no business being there.

Jan 17 2011

Silverlight and HTTP and Caching, OH MY!

time to read 5 min | 973 words

Tweet Share Share 5 comments

Tags:

Bugs

We are currently building the RavenDB Siliverlight Client, and along the way we run into several problems. Some of them were solved, but others proved to be much harder to deal with.

In particular, it seems that the way Silverlight handle HTTP caching is absolutely broken. In particular, Silverlight aggressively caches HTTP requests (which is good & proper), but the problem is that it ignores just about any of the cache control mechanisms that the HTTP spec specifies.

Note: I am talking specifically about the Client HTTP stack, the Browser HTTP stack behaves properly.

Here is the server code:

static void Main()
{
    var listener = new HttpListener();
    listener.Prefixes.Add("http://+:8080/");
    listener.Start();

    int reqNum = 0;
    while (true)
    {
        var ctx = listener.GetContext();

        Console.WriteLine("Req #{0}", reqNum);

        if(ctx.Request.Headers["If-None-Match"] == "1234")
        {
            ctx.Response.StatusCode = 304;
            ctx.Response.StatusDescription = "Not Modified";
            ctx.Response.Close();
            continue;
        }

        ctx.Response.Headers["ETag"] = "1234";
        ctx.Response.ContentType = "text/plain";
        using(var writer = new StreamWriter(ctx.Response.OutputStream))
        {
            writer.WriteLine(++reqNum);
            writer.Flush();
        }
        ctx.Response.Close();
    }
}

This is a pretty simple implementation, let us see how it behaves when we access the server from Internet Explorer (after clearing the cache):

The first request will hit the server to get the data, but the second request will ask the server if the cached data is fresh enough, and use the cache when we get the 304 reply.

So far, so good. And as expected, when we see the presence of an ETag in the request.

Let us see how this behaves with an Out of Browser Silverlight application (running with elevated security):

public partial class MainPage : UserControl
{
    public MainPage()
    {
        InitializeComponent();
    }

    private void StartRequest(object sender, RoutedEventArgs routedEventArgs)
    {
        var webRequest = WebRequestCreator.ClientHttp.Create(new Uri("http://ipv4.fiddler:8080"));

        webRequest.BeginGetResponse(Callback, webRequest);
    }

    private void Callback(IAsyncResult ar)
    {
        var webRequest = (WebRequest) ar.AsyncState;
        var response = webRequest.EndGetResponse(ar);

        var messageBoxText = new StreamReader(response.GetResponseStream()).ReadToEnd();
        Dispatcher.BeginInvoke(() => MessageBox.Show(messageBoxText));
    }
}

The StartRequest method is wired to a button click on the window.

Before starting, I cleared the IE cache again. Then I started the SL application and hit the button twice. Here is the output from fiddler:

Do you notice what you are not seeing here? That is right, there is no second request to the server to verify that the resource has not been changed.

Well, I thought to myself, that might be annoying behavior, but we can fix that, all we need to do is to specify must-revalidate in the Cache-Control. And so I did just that:

Do you see Silverlight pissing all over the HTTP spec? The only aspect of Cache-Control that the ClientHttp stack in Silverlight seems to respect is no-cache, which completely ignores etags.

As it turns out, there is one way of doing this. You need to send an Expires header in the past as well as an ETag header. The Expires header will force Silverlight to make the request again, and the ETag will be used to re-validate the request, resulting in a 304 reply from the server, which will load the data from the cache.

The fact that there is a workaround doesn’t mean it is not a bug, and it is a pretty severe one, making it much harder to write proper REST clients in Silverlight.

Jan 17 2011

Today is a great day: Zero Inbox!

time to read 1 min | 69 words

Tweet Share Share 5 comments

Originally posted at 1/10/2011

This is the first time in I don’t know how long, months, closer to half a year!

It is 16:37, and I think that I’ll just leave the office before something else jumps at me.

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB on DevExpress TV

The best bug reports are pull requests

No, Alanis, THIS is ironic

Performance optimizationsRinse, Repeat, Repeat, Repeat

Code Review tools

RavenMQ update

Recording of my RavenDB talk for E-VAN

It is the little things that trips you: Reducing RavenDB’s Map/Reduce cost

Silverlight and HTTP and Caching, OH MY!

Today is a great day: Zero Inbox!

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed