Ayende @ Rahien

Refunds available at head office

RavenDB Awesome Feature of the Day, Formatted Indexes

There is a chance that you’ll look at me strangely for calling this the “Feature of the Day”. But that is actually quite a important little feature.

Here is the deal, let us say that you have the following index:

public class Orders_Search : AbstractIndexCreationTask<Order, Orders_Search.ReduceResult>
{
    public class ReduceResult
    {
        public string Query { get; set; }
        public DateTime LastPaymentDate { get; set; }
    }

    public Orders_Search()
    {
        Map = orders => from order in orders
                        let lastPayment = order.Payments.LastOrDefault()
                        select new
                        {
                            Query = new object[]
                            {
                                order.FirstName, 
                                order.LastName, 
                                order.OrderNumber, 
                                order.Email, 
                                order.Email.Split('@'),
                                order.CompanyName,
                                order.Payments.Select(payment => payment.PaymentIdentifier),
                                order.LicenseIds
                            },
                            LastPaymentDate = lastPayment == null ? order.OrderedAt : lastPayment.At
                        };
    }
}

And you are quite happy with it. But that is the client side perspective. We don’t have any types on the server, so you can’t just execute this there. Instead, we send a string representing the index to the server. That string is actually the output of the linq expression, which looks like this:

image

This is… somewhat hard to read, I think you’ll agree. So we had some minimal work done to improve this, and right now what you’ll get is (you’ll likely see it roll off the screen, that is expected):

docs.Orders
    .Select(order => new {order = order, lastPayment = order.Payments.LastOrDefault()})
    .Select(__h__TransparentIdentifier0 => new {Query = new System.Object []{__h__TransparentIdentifier0.order.FirstName, __h__TransparentIdentifier0.order.LastName, __h__TransparentIdentifier0.order.OrderNumber, __h__TransparentIdentifier0.order.Email, __h__TransparentIdentifier0.order.Email.Split(new System.Char []{'@'}), __h__TransparentIdentifier0.order.CompanyName, __h__TransparentIdentifier0.order.Payments
    .Select(payment => payment.PaymentIdentifier), __h__TransparentIdentifier0.order.LicenseIds}, LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At})

This is still quite confusing, actually. But still better than the alternative.

As I said, it seems like a little thing, but those things are important. An index in its compiled form that is hard to understand for a user is a support issue for us. We needed to resolve this issue.

The problem is that source code beautifying is non trivial. I started playing with parsers a bit, but it was all way too complex. Then I had an epiphany. I didn’t actually care about the code, I just wanted it sorted. There aren’t many C# code beautifiers around, but there are a lot for JavaScript.

I started with the code from http://jsbeautifier.org/, which Rekna Anker had already ported to C#. From there, it was an issue of making sure that for my purposes, the code generated the right output. I had to teach it C# idioms such as @foo, null coalescent and lambda expressions, but that sounds harder than it actually was. With that done, we go this output:

docs.Orders.Select(order => new {
    order = order,
    lastPayment = order.Payments.LastOrDefault()
}).Select(__h__TransparentIdentifier0 => new {
    Query = new System.Object[] {
        __h__TransparentIdentifier0.order.FirstName,
        __h__TransparentIdentifier0.order.LastName,
        __h__TransparentIdentifier0.order.OrderNumber,
        __h__TransparentIdentifier0.order.Email,
        __h__TransparentIdentifier0.order.Email.Split(new System.Char[] {
            '@'
        }),
        __h__TransparentIdentifier0.order.CompanyName,
        __h__TransparentIdentifier0.order.Payments.Select(payment => payment.PaymentIdentifier),
        __h__TransparentIdentifier0.order.LicenseIds
    },
    LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At
})

And this is actually much better. Still not good enough, mind. we can do better than that. It is a simple change:

docs.Orders.Select(order => new {
    order = order,
    lastPayment = order.Payments.LastOrDefault()
}).Select(this0 => new {
    Query = new System.Object[] {
        this0.order.FirstName,
        this0.order.LastName,
        this0.order.OrderNumber,
        this0.order.Email,
        this0.order.Email.Split(new System.Char[] {
            '@'
        }),
        this0.order.CompanyName,
        this0.order.Payments.Select(payment => payment.PaymentIdentifier),
        this0.order.LicenseIds
    },
    LastPaymentDate = this0.lastPayment == null ? this0.order.OrderedAt : this0.lastPayment.At
})

And now we got to something far more readable Smile.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Roy
08/17/2012 09:25 AM by
Roy

Nice! Perhaps you could even leave out the "System." prefixes and "System.Object" altogether?

configurator
08/17/2012 09:58 AM by
configurator

There are two changes I would make:

  1. Substitute C# type aliases where appropriate. This isn't hard, and just needs a hard-coded list of the 15-or-so aliases that exist. (I wouldn't leave out the "System." or any other namespaci n other types, myself).
  2. Use 'x', 'y' and 'z' instead of this0 (and presumably this1 and this2) - once you run out of those three - which I'm guessing you won't in 95% of cases - you can use an identifier with a number like this0 or x0. But using xyz first would make the entire thing a bit more readable.

All in all, this is an awesome feature.

Daniel Grunwald
08/17/2012 11:15 AM by
Daniel Grunwald

You could also use the code from ILSpy that transforms C# LINQ calls back into query expressions. (IntroduceQueryExpressions and CombineQueryExpressions transforms) Those two are purely syntactic transformations, they don't consume any additional information from previous decompiler stages.

Although pulling in a full-blown C# parser as a dependency might be overkill for this problem :)

Mariolino
08/17/2012 02:44 PM by
Mariolino

Hi,

broken url... I can't see the url

http://ayende.com/blog/157665/data-virtualization-lazy-loading-stealth-pagingndash-whatever-you-want-to-call-it-herersquo-s-how-to-do-it-in-silverlight?key=f69eddad-8e64-4363-94ac-2da433d52515&utmsource=feedburner&utmmedium=feed&utm_campaign=Feed%3A+AyendeRahien+%28Ayende+%40+Rahien%29

Matt Johnson
08/17/2012 03:36 PM by
Matt Johnson

"Then I had an affiany." I believe the word you wanted was "epiphany"

One thing I don't quite get: In the reduce result, you specify Query as a string, but in the mapping it's clearly an array of objects. I thought that these had to match?

Also (to repeat one of your favorite lines) - What are you actually trying to do here? If this is just an index of all of those properties, why do you need the array of objects at all?

If I was to guess, it looks like the index is such that you can search across all of these fields at the same time? If so, is this the recommended approach, and is it written up somewhere that I can't seem to find?

Matt Warren
08/17/2012 04:23 PM by
Matt Warren

@Matt

That index is from this blog post http://ayende.com/blog/152833/orders-search-in-ravendb.

And yes, the idea is that you can search across several fields at the same time.

Also it's not a Map/Reduce query, it's just using ReduceResult as the type for the shape of the Map output.

Ayende Rahien
08/21/2012 10:20 AM by
Ayende Rahien

Roy, Good idea, I'll see if that can be made to work.

Ayende Rahien
08/21/2012 10:21 AM by
Ayende Rahien

configurator,

1) Will be done. 2) Cannot really work. What happen if you already use x or y in your lambdas already? this0 it much less likely

Ayende Rahien
08/21/2012 10:26 AM by
Ayende Rahien

Daniel, We already have a dependency on NRefactory, although on the server, and not on the client, which is where this code is runnig. Any reference for how to use those two?

Ayende Rahien
08/21/2012 10:28 AM by
Ayende Rahien

Matt, Thanks, typo fixed.

Regarding the results, RavenDB has indexing model & query model, they don't have to quite match from types perspective, because we do a lot of funcy stuff.

This index is explained here: http://ayende.com/blog/152833/orders-search-in-ravendb

Daniel Grunwald
08/21/2012 11:03 AM by
Daniel Grunwald

I've extracted the query expression decompiler logic into a standalone program: https://gist.github.com/3414523

It might be a bit too aggressive though, sometimes it would be more readable to keep the lambdas around.

Matt Johnson
08/21/2012 02:20 PM by
Matt Johnson

Thanks for the clarification on the search index. I actually have several places in my app where this technique will be useful. Thanks.

Ayende Rahien
08/21/2012 05:07 PM by
Ayende Rahien

Daniel, Thanks, looks awesome. I wonder if there is a good way to get this without bringing the full parser in.

Comments have been closed on this topic.