And some people will INSIST on shooting them own foot off

time to read 1 min | 14 words

Because, clearly, that is what is missing. RavenDB GetAll extension method

Tweet Share Share 35 comments

Tags:

raven
wtf?!

Comments

14 Jun 2013
09:43 AM

Damien

Creating it in the first place is a bit WTF. Deciding to hold all of the results in a List and only return it after all of the calls complete, despite being inside an IEnumerable method just... elevates it to another level.

14 Jun 2013
09:52 AM

Patrick Huizinga

I can somewhat understand wanting to get all documents. But: var results = new List<T>(); Really..?

Btw, what do you think of my addition? public static IEnumerable<T> GetRange<T>(this IDocumentStore documentStore, int start, int count) { var results = new List<T>(); for (int i = 0; i < count; i++) { result.Add(documentStore.GetAll().ElementAt(start + i)); } return results; }

:trollface:

14 Jun 2013
09:53 AM

Patrick Huizinga

Ugh, no preview and no edit >.< Let's see if this works:

public static IEnumerable<T> GetRange<T>(this IDocumentStore documentStore, int start, int count)
{
    var results = new List<T>();
    for (int i = 0; i < count; i++)
    {
        result.Add(documentStore.GetAll().ElementAt(start + i));
    }
    return results;
}

:trollface:

14 Jun 2013
11:12 AM

Joel

Can someone clarify what's wrong with this please? I'm new to ravendb and understand the basic Do's and Dont's, but a rundown of why this is bad would be great, for myself as well as anyone else, especially those who might come to this page after googling 'ravendb getall'.

14 Jun 2013
11:24 AM

Ayende Rahien

Joel, Look at unbounded result sets, as well as the real reason why we don't allow this in RavenDB. Basically, what happens if you have 1 million results.

14 Jun 2013
11:42 AM

Wyatt Barnett

For the record I agree with the design impetus for making this so. Then again, sometimes one just wants to get all the Ts and many times you know you won't have 1m or even 1000 records in a collection but you could well have more than 128 and you don't want to write a pager loop to handle it.

Now, I recall seeing somewhere there was a new 'stream me all the T' api option but that doesn't help people on older versions.

14 Jun 2013
11:53 AM

Duckie

I have some collections with many small documents, and i just need all of them, easy. As i am working a lot with moving/importing data (~2000 docs) around, i had to do the same workaround. Forcing users to make stupid things themselves, and then blaming them i find is quite silly.

14 Jun 2013
12:15 PM

David Zidar

I agree that most of the time you don't want unbounded result sets. But there are legitimate reasons for wanting to retrieve all the data in a collection. For instance when exporting data in some other format or when generating a sitemap.xml with all pages and such.

There are exceptions to every rule.

14 Jun 2013
13:19 PM

Scott Scowden

I agree, there are definitely cases that you need more than 1024 records. Even worse, when using a hosted RavenDB, you can't easily change this value to retrieve more.

For example, I need to list all Zip Codes in a state to allow users to multi-select them.

Not saying his implementation is good, but there are definitely cases where it's needed.

14 Jun 2013
13:23 PM

Frank

@Duckie,

having to move/import data in batches already sounds like a "workaround". If you would send a message the target system as soon as your entity represented by the document changes would change that batch process into a real-time interface. And remove the query all documents necessity.

14 Jun 2013
13:38 PM

Kijana Woodard

Yield return would at least prevent complete waste when the calling code does Take(x).

The "pager code" is pretty simple to write and is a good warning that you are doing something potentially dangerous.

Trying to make GetAll generic and reusable is much much more difficult. What I've seen is that soon you want to add a Where condition, then you want custom skip/take, then you want to get the Statistics, then you want to Include some other document, then you want to WaitForStale...

Soon this GetAll method and it's overloads are a pretty substantial API for which each combination of parameters has exactly one usage in the system.

And then there's this: http://ayende.com/blog/161249/ravendbs-querying-streaming-unbounded-results

14 Jun 2013
13:41 PM

Kijana Woodard

@Scott - Each zip code has it's own document? I would think they would be grouped into far fewer docs.

@duckie - import/export could be done via the smuggler api. It would be interesting to see what Studio is doing here and emulate that.

14 Jun 2013
14:17 PM

João Bragança

What's wrong with this? I mean theoretically a windows server can 'scale' up to 4TB of memory. That way you don't have to pay developers to think and write good code!

14 Jun 2013
15:01 PM

Ayende Rahien

Wyatt, What is the actual user scenario that requires all the data, when the data can be many thousands of records?

14 Jun 2013
15:02 PM

Ayende Rahien

Duckie, We have explicit support for bulk insert / reads. That prevent you from loading everything into memory.

14 Jun 2013
15:02 PM

Ayende Rahien

Scott, Why are you storing all the zip codes as a separate documents?

14 Jun 2013
15:15 PM

Daniel Lang

... and I don't understand why you don't understand it. There are cenarios beyond OLTP web applications where you just need this: GetAll(). I'm using it heavily in a desktop application that runs on RavenDB embedded. I know the perfomance implications of every other approach and yes, I think GetAll is the best in our situation. I'm sure there are other valid use-cases as well which you could have addressed with a better implementation of the streaming API.

14 Jun 2013
15:25 PM

Duckie

Ayende, i need all data in memory, so i can use whatever linq commands, filtering, querying, sorting etc i want. Performance here is not an issue at all. I got loads of data i need to do manipulation on.

14 Jun 2013
15:26 PM

Ayende Rahien

Duckie, Whatever for? Filtering, querying & sorting are db tasks, not in memory tasks.

14 Jun 2013
15:35 PM

jdn

@Duckie, @Daniel:

Don't worry. Ayende has been wrong about this from the start but implemented this auto-handcuff for marketing reasons.

There are sound technical reasons for wanting GetAll(). There used to be a way to override the "dumb by default" behavior in RavenDB, not sure if it is still in the code base or not.

14 Jun 2013
16:11 PM

Judah Gabriel Himango

I wonder how many hundreds or thousands of apps are actually efficient because RavenDB forced them to be, and forced lazy developers to do proper paging and/or document structure.

RavenDB has forced me to think about performance from the start, when normally I'd be lazy about it with SQL+O/RM.

14 Jun 2013
16:16 PM

Kijana Woodard

@Daniel and @jdn

Sure. And it's pretty easy to roll yourself with the exact "flavor" you need (from my other commment). A GetAll in the API doesn't add much value to the common case.

For "embedded and not that much data and I understand" scenarios, I personally have used LoadStartingWith and avoid the query issues altogether.

LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = a lot of ways to handle these situations without having a simple, but dangerous, method exposed on the api.

14 Jun 2013
16:20 PM

Kijana Woodard

Also, Dynamic Reporting takes care of another set of cases: http://ayende.com/blog/162339/ravendbs-dynamic-reporting

Facets solve for still others.

The difference being that these choices address specific concerns regarding working with the entire dataset instead of exposing a seemingly simple api method and hoping the user understands the intersection between the subtleties of what they are actually trying to achieve and what the api is actually doing.

14 Jun 2013
16:23 PM

jdn

@Kijana:

If I say "Select * from", I want select *.

If I want "select top 1024 from", then I will write that.

"LoadStartingWith + the new Streaming API + Smuggler + roll your own while loop = " a pain in the kiester.

At some point, it went from "running with scissors" to "crawling with pillows."

14 Jun 2013
16:28 PM

Tim Murphy

@Judah is quite right that Raven makes you think about performance and therefore paging.

My only beef is I think an exception should be thrown if the number of documents requested is greater than the default 128.

14 Jun 2013
16:31 PM

Kijana Woodard

@jdn - Sure. If I was writing sql, fine. The problem is we're using abstractions on top of abstractions.

Code like that GetAll extension method is one of the primary reasons so many people (DBAs) say "EF Sucks". EF is fine, but once you abstract away what's going past a certain point, it will just lead to painful "surprises" down the road.

I once worried about this and typed up a post for the forum. I then realized that the while loop to page the results was shorter than the post I was writing.

14 Jun 2013
16:32 PM

Duckie

Ayende, the DB cannott do what i want, without a lot of investment in time. I just need my data out, so i can work with it myself.

I understand the desire in optimal use of Ravendb by limiting the API, but forcing users to do stupid things is .. stupid.

Maybe just make a method called quyery.GetAllWhileUnderstandingThisIsStupid() ..

14 Jun 2013
16:34 PM

Kijana Woodard

@Tim, you mean if the total document count is greater than 128 and you haven't specified a Take?

I like to explicitly define a Take for all queries, but I'd probably say log WARN instead of throw.

14 Jun 2013
18:20 PM

Foo

This reminds me of a technical lead in a fortune 500 company explaining me how having a web service exposing something like public dataset execute(string query, string connectionstring) was great to speed up development and deployments. Yes you can, no you shouldn't.

14 Jun 2013
18:50 PM

João Bragança

@David

The 'I might need to get everything because of sitemap' is questionable. Google doesn't NEED sitemap to index your site. You just need to ensure that all of your pages are reachable from the bookmark url. Oren's blog has lots of dynamic content too, a lot more than 1024 posts - see the sidebar. But of course it is all indexed by google. Someone should write an article about this...

14 Jun 2013
20:02 PM

Duckie

Sitemaps is not only about making a list of links for indexing, but also to show google the structure of the site. Besides, if they want to expose a sitemap, why is this questionable?

Fact is, if you want to load many documents to memory you have to do special stuff with ravendb, No matter what valid reason you might have for it.

This is what users experience / what i experienced.

You only get a limited number of records. You increase this. You run in to the maximum limit of records. You start paging it out, but you run in to the maximum queries per session exception. You increase the number of allowable requests, or you create multiple sessions.

Since streams were added, it is of course easier to do.

15 Jun 2013
14:08 PM

Sarmaad

at the beginning I had the same thoughts.. but now, no way.. I rather while loop than just blindly get all documents.

I found myself asking.. do i need this here, is the model designed correctly or should this be a map/reduce..

don't change a thing.

17 Jun 2013
20:27 PM

Karg

We actually have some legacy APIs that we've converted over to use RavenDB on the back end, but we still have to maintain the non-paged methods.

We have the following (better) extension method to get all. It obeys skipped results and returns an IEnumerable<T> so you can avoid materializing the whole thing if you're just operating over the whole set.

This is with Raven 1.0, we'll use Streams when we upgrade.

http://pastebin.com/AqaAu6DC

18 Jun 2013
11:15 AM

Sean Kearon

I'm using embedded in a desktop application and I have to agree completely with @Daniel here. "GetAll" is absolutely essential for my use cases, as it ensuring that the query does not wait for any stale results.

I'm also using 1.0 currently, but will likely move to streams when I get time to upgrade.

20 Jun 2013
14:16 PM

Jon Canning

Oh dear, how embarrassing, I know it's wrong but I needed a quick hack and had just read this:

http://stackoverflow.com/questions/11268955/retrieving-entire-data-collection-from-a-raven-db

I put in on my blog in case I needed it again; honestly didn't expect anyone to find it! I'll remove it for fear of encouraging others.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB