Ayende @ Rahien

It's a girl

And some people will INSIST on shooting them own foot off

Because, clearly, that is what is missing. RavenDB GetAll extension method

Comments

Damien
06/14/2013 09:43 AM by
Damien

Creating it in the first place is a bit WTF. Deciding to hold all of the results in a List and only return it after all of the calls complete, despite being inside an IEnumerable method just... elevates it to another level.

Patrick Huizinga
06/14/2013 09:52 AM by
Patrick Huizinga

I can somewhat understand wanting to get all documents. But: var results = new List<T>(); Really..?

Btw, what do you think of my addition? public static IEnumerable GetRange(this IDocumentStore documentStore, int start, int count) { var results = new List(); for (int i = 0; i < count; i++) { result.Add(documentStore.GetAll().ElementAt(start + i)); } return results; }

:trollface:

Patrick Huizinga
06/14/2013 09:53 AM by
Patrick Huizinga

Ugh, no preview and no edit >.< Let's see if this works:

public static IEnumerable<T> GetRange<T>(this IDocumentStore documentStore, int start, int count)
{
    var results = new List<T>();
    for (int i = 0; i < count; i++)
    {
        result.Add(documentStore.GetAll().ElementAt(start + i));
    }
    return results;
}

:trollface:

Joel
06/14/2013 11:12 AM by
Joel

Can someone clarify what's wrong with this please? I'm new to ravendb and understand the basic Do's and Dont's, but a rundown of why this is bad would be great, for myself as well as anyone else, especially those who might come to this page after googling 'ravendb getall'.

Ayende Rahien
06/14/2013 11:24 AM by
Ayende Rahien

Joel, Look at unbounded result sets, as well as the real reason why we don't allow this in RavenDB. Basically, what happens if you have 1 million results.

Wyatt Barnett
06/14/2013 11:42 AM by
Wyatt Barnett

For the record I agree with the design impetus for making this so. Then again, sometimes one just wants to get all the Ts and many times you know you won't have 1m or even 1000 records in a collection but you could well have more than 128 and you don't want to write a pager loop to handle it.

Now, I recall seeing somewhere there was a new 'stream me all the T' api option but that doesn't help people on older versions.

Duckie
06/14/2013 11:53 AM by
Duckie

I have some collections with many small documents, and i just need all of them, easy. As i am working a lot with moving/importing data (~2000 docs) around, i had to do the same workaround. Forcing users to make stupid things themselves, and then blaming them i find is quite silly.

David Zidar
06/14/2013 12:15 PM by
David Zidar

I agree that most of the time you don't want unbounded result sets. But there are legitimate reasons for wanting to retrieve all the data in a collection. For instance when exporting data in some other format or when generating a sitemap.xml with all pages and such.

There are exceptions to every rule.

Scott Scowden
06/14/2013 01:19 PM by
Scott Scowden

I agree, there are definitely cases that you need more than 1024 records. Even worse, when using a hosted RavenDB, you can't easily change this value to retrieve more.

For example, I need to list all Zip Codes in a state to allow users to multi-select them.

Not saying his implementation is good, but there are definitely cases where it's needed.

Frank
06/14/2013 01:23 PM by
Frank

@Duckie,

having to move/import data in batches already sounds like a "workaround". If you would send a message the target system as soon as your entity represented by the document changes would change that batch process into a real-time interface. And remove the query all documents necessity.

Kijana Woodard
06/14/2013 01:38 PM by
Kijana Woodard

Yield return would at least prevent complete waste when the calling code does Take(x).

The "pager code" is pretty simple to write and is a good warning that you are doing something potentially dangerous.

Trying to make GetAll generic and reusable is much much more difficult. What I've seen is that soon you want to add a Where condition, then you want custom skip/take, then you want to get the Statistics, then you want to Include some other document, then you want to WaitForStale...

Soon this GetAll method and it's overloads are a pretty substantial API for which each combination of parameters has exactly one usage in the system.

And then there's this: http://ayende.com/blog/161249/ravendbs-querying-streaming-unbounded-results

Kijana Woodard
06/14/2013 01:41 PM by
Kijana Woodard

@Scott - Each zip code has it's own document? I would think they would be grouped into far fewer docs.

@duckie - import/export could be done via the smuggler api. It would be interesting to see what Studio is doing here and emulate that.

João Bragança
06/14/2013 02:17 PM by
João Bragança

What's wrong with this? I mean theoretically a windows server can 'scale' up to 4TB of memory. That way you don't have to pay developers to think and write good code!

Ayende Rahien
06/14/2013 03:01 PM by
Ayende Rahien

Wyatt, What is the actual user scenario that requires all the data, when the data can be many thousands of records?

Ayende Rahien
06/14/2013 03:02 PM by
Ayende Rahien

Duckie, We have explicit support for bulk insert / reads. That prevent you from loading everything into memory.

Ayende Rahien
06/14/2013 03:02 PM by
Ayende Rahien

Scott, Why are you storing all the zip codes as a separate documents?

Daniel Lang
06/14/2013 03:15 PM by
Daniel Lang

... and I don't understand why you don't understand it. There are cenarios beyond OLTP web applications where you just need this: GetAll(). I'm using it heavily in a desktop application that runs on RavenDB embedded. I know the perfomance implications of every other approach and yes, I think GetAll is the best in our situation. I'm sure there are other valid use-cases as well which you could have addressed with a better implementation of the streaming API.

Duckie
06/14/2013 03:25 PM by
Duckie

Ayende, i need all data in memory, so i can use whatever linq commands, filtering, querying, sorting etc i want. Performance here is not an issue at all. I got loads of data i need to do manipulation on.

Ayende Rahien
06/14/2013 03:26 PM by
Ayende Rahien

Duckie, Whatever for? Filtering, querying & sorting are db tasks, not in memory tasks.

jdn
06/14/2013 03:35 PM by
jdn

@Duckie, @Daniel:

Don't worry. Ayende has been wrong about this from the start but implemented this auto-handcuff for marketing reasons.

There are sound technical reasons for wanting GetAll(). There used to be a way to override the "dumb by default" behavior in RavenDB, not sure if it is still in the code base or not.

Judah Gabriel Himango
06/14/2013 04:11 PM by
Judah Gabriel Himango

I wonder how many hundreds or thousands of apps are actually efficient because RavenDB forced them to be, and forced lazy developers to do proper paging and/or document structure.

RavenDB has forced me to think about performance from the start, when normally I'd be lazy about it with SQL+O/RM.

Kijana Woodard
06/14/2013 04:16 PM by
Kijana Woodard

@Daniel and @jdn

Sure. And it's pretty easy to roll yourself with the exact "flavor" you need (from my other commment). A GetAll in the API doesn't add much value to the common case.

For "embedded and not that much data and I understand" scenarios, I personally have used LoadStartingWith and avoid the query issues altogether.

LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = a lot of ways to handle these situations without having a simple, but dangerous, method exposed on the api.

Kijana Woodard
06/14/2013 04:20 PM by
Kijana Woodard

Also, Dynamic Reporting takes care of another set of cases: http://ayende.com/blog/162339/ravendbs-dynamic-reporting

Facets solve for still others.

The difference being that these choices address specific concerns regarding working with the entire dataset instead of exposing a seemingly simple api method and hoping the user understands the intersection between the subtleties of what they are actually trying to achieve and what the api is actually doing.

jdn
06/14/2013 04:23 PM by
jdn

@Kijana:

If I say "Select * from", I want select *.

If I want "select top 1024 from", then I will write that.

"LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = " a pain in the kiester.

At some point, it went from "running with scissors" to "crawling with pillows."

Tim Murphy
06/14/2013 04:28 PM by
Tim Murphy

@Judah is quite right that Raven makes you think about performance and therefore paging.

My only beef is I think an exception should be thrown if the number of documents requested is greater than the default 128.

Kijana Woodard
06/14/2013 04:31 PM by
Kijana Woodard

@jdn - Sure. If I was writing sql, fine. The problem is we're using abstractions on top of abstractions.

Code like that GetAll extension method is one of the primary reasons so many people (DBAs) say "EF Sucks". EF is fine, but once you abstract away what's going past a certain point, it will just lead to painful "surprises" down the road.

I once worried about this and typed up a post for the forum. I then realized that the while loop to page the results was shorter than the post I was writing.

Duckie
06/14/2013 04:32 PM by
Duckie

Ayende, the DB cannott do what i want, without a lot of investment in time. I just need my data out, so i can work with it myself.

I understand the desire in optimal use of Ravendb by limiting the API, but forcing users to do stupid things is .. stupid.

Maybe just make a method called quyery.GetAllWhileUnderstandingThisIsStupid() ..

Kijana Woodard
06/14/2013 04:34 PM by
Kijana Woodard

@Tim, you mean if the total document count is greater than 128 and you haven't specified a Take?

I like to explicitly define a Take for all queries, but I'd probably say log WARN instead of throw.

Foo
06/14/2013 06:20 PM by
Foo

This reminds me of a technical lead in a fortune 500 company explaining me how having a web service exposing something like public dataset execute(string query, string connectionstring) was great to speed up development and deployments. Yes you can, no you shouldn't.

João Bragança
06/14/2013 06:50 PM by
João Bragança

@David

The 'I might need to get everything because of sitemap' is questionable. Google doesn't NEED sitemap to index your site. You just need to ensure that all of your pages are reachable from the bookmark url. Oren's blog has lots of dynamic content too, a lot more than 1024 posts - see the sidebar. But of course it is all indexed by google. Someone should write an article about this...

Duckie
06/14/2013 08:02 PM by
Duckie

Sitemaps is not only about making a list of links for indexing, but also to show google the structure of the site. Besides, if they want to expose a sitemap, why is this questionable?

Fact is, if you want to load many documents to memory you have to do special stuff with ravendb, No matter what valid reason you might have for it.

This is what users experience / what i experienced.

You only get a limited number of records. You increase this. You run in to the maximum limit of records. You start paging it out, but you run in to the maximum queries per session exception. You increase the number of allowable requests, or you create multiple sessions.

Since streams were added, it is of course easier to do.

Sarmaad
06/15/2013 02:08 PM by
Sarmaad

at the beginning I had the same thoughts.. but now, no way.. I rather while loop than just blindly get all documents.

I found myself asking.. do i need this here, is the model designed correctly or should this be a map/reduce..

don't change a thing.

Karg
06/17/2013 08:27 PM by
Karg

We actually have some legacy APIs that we've converted over to use RavenDB on the back end, but we still have to maintain the non-paged methods.

We have the following (better) extension method to get all. It obeys skipped results and returns an IEnumerable so you can avoid materializing the whole thing if you're just operating over the whole set.

This is with Raven 1.0, we'll use Streams when we upgrade.

http://pastebin.com/AqaAu6DC

Sean Kearon
06/18/2013 11:15 AM by
Sean Kearon

I'm using embedded in a desktop application and I have to agree completely with @Daniel here. "GetAll" is absolutely essential for my use cases, as it ensuring that the query does not wait for any stale results.

I'm also using 1.0 currently, but will likely move to streams when I get time to upgrade.

Jon Canning
06/20/2013 02:16 PM by
Jon Canning

Oh dear, how embarrassing, I know it's wrong but I needed a quick hack and had just read this:

http://stackoverflow.com/questions/11268955/retrieving-entire-data-collection-from-a-raven-db

I put in on my blog in case I needed it again; honestly didn't expect anyone to find it! I'll remove it for fear of encouraging others.

Comments have been closed on this topic.