RavenDB 3.5 Features: Data Exploration
RavenDB is doing a pretty great job for being a production database, in fact, we have designed it upfront to only have features that make sense to have for robust production systems.
In particular, we don’t have any form of ad-hoc queries. A query always hits an index, so it is very fast. Even what we call dynamic queries in RavenDB are actually creating an index behind the scene. This is pretty awesome for normal production usage, but it does have some limitations when you want to explore the data. This can be because you are a developer trying to find a particular something, and you just want to quickly fire off random queries. You don’t care about the costs, and you don’t want to generate indexes. Or you can be an admin that needs to get a particular report from the system and you want to play around with the details until you get everything right.
In order to serve those needs, RavenDB 3.5 is going to have a really nice feature, explicit data exploration.
For example, let us say that I want to count the number of unique words in all of my posts, I can do it using the following:
Note that the actual query is pretty meaningless, and I’m writing this at 1AM with a baby nearby that make funny noises, so the Linq statement there works, but can probably be better.
The point here is that to demo what is going on. We write a simple Linq statement, and can run it against our database, and then gather the results back. It is like having LinqPad directly inside the RavenDB studio. In fact, that is the number one scenario that we envision for this feature, replacing LinqPad usage by having a native capability.
Now, some caveats. As you can see, you can select to limit the query duration as well as the number of documents it will operate on. That give us a quick way to explore the data without putting too much load on the server. You can even take the output here and throw it directly to Excel. “Sam, can you give the a breakdown of orders this year by month and country? Just email me the Excel spreadsheet”.
Note that this is intended as a user feature, it isn’t something that we provide an API for. It is there for admins or developers that are figuring things out, an admin feature, not something that you want to use on production.
Comments
Is this based on Raven/DocumentsByEntityName index? It will be great to be able to specify LastModified sort and ranges.
BTW also it will be great to be able to process data from any index so there will be no need to use Load in such as query.
Vlko, Yes, this is based on that index. You can do sorting and filtering in the Linq, why do that externally?
And we won't do any index, that would generate more complexity. Just do that with a transformer on any index.
add filtering in Linq: because sometimes I want to analyze last x documents and sometimes first x documents, but in most cases I always want to analyze last with desc LastModified sorting.
add transformers: but transformers I need to store/update and no custom group, sure they are fine for common things, but to have something like "just copy'n'paste this and change param and see immediate result for something I need only once" will be great:)
Hey, nice feature, guys! I've wanted to do ad hoc C# queries in the Studio, not worrying about indexes. Too often I've had to dip into Lucene queries, which really should just be an implementation detail.
This simplifies all that. Nice work.
That is really amazing! Thank you for that.
Comment preview