RavenDB Retrospective: Unbounded result sets

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Sep 28 2016

RavenDB RetrospectiveUnbounded result sets

time to read 4 min | 750 words

Image result for team retrospective We spent some time recently looking into a lot of our old design decisions. Some of them make very little sense today (json vs. blittalbe as a good example), but made perfect sense at the time, and were essential to actually getting the product out.

Some of those design decisions, however, are still something that I very firmly believe in. This series of posts is going to explore those decisions, their background and how they played out in the real world. So, without further ado, let us talk about unbounded result sets.

The design of RavenDB was heavily influenced by my experience as That NHibernate Guy (I got started with NHibernate over a decade ago, if you can believe that), where I saw the same types of error, repeated over and over again. I then read Release It!, and I suddenly discovered that I wasn’t alone fighting those kind of demons. When I designed RavenDB, I set out explicitly to prevent as many of those as I possibly could.

One of the major issues that I wanted to address was Unbounded Result Sets, simply put, this is when you have:

SELECT * FROM OrderLines WHERE OrderID = 1555

And you don’t realize that this order has three million line items (or, which is worst, that most of your orders have a few thousands line items, so you are generating a lot of load on the database, only to throw most of them away).

In order to prevent this type of issue, RavenDB has the notion of mandatory page sizes.

On the client side, if you don’t specify a limit, we’ll implicitly add one (by default set to 128).
On the server side, there is a database wide maximum page size (by default set to 1024). The server will trim all page sizes to the max if they are larger.

I think that this is one of the more controversial decisions in RavenDB design, and one that got a lot of heated discussion. But I still think that this is a good idea,because I have seen what happens when you don’t do that. And the arguments are mostly about “RavenDB should trust developers to know what they are doing” and a particular irate guy called me while I was out shopping to complain how I broke the sacred contract of Linq with regards to “queries should return all by default, even if this is ten billion results”. I pointed out that this is actually configurable, and if he wanted to set the default to any size he wanted, he could do that, but apparently it is supposed to be “shoot my own foot first, then think” kind of deal.

Even though that I still think that this is a really good idea, we have added some features over the years to make it easy for people to access the entire dataset when they need it. Streaming has been around since 2.5 or so, giving you a dedicated API to stream unbounded results. Streams were built to make it efficient to process large sets of data, and they allow both client & server to process the data in parallel, instead of batching huge responses on the server, then consuming ridiculous amounts of memory on the client before giving you the full result set. Instead, you can get each result as soon as it arrive from server, and you can process it and send it further.

In 4.0, we are going to change the behavior of the paging limits so:

If you don’t specify a limit, we’ll supply a limit clause of 25 items. If there are more than 25 items, we’ll throw an exception (unless you asked otherwise in the conventions).
If you supply a limit explicitly, it will work as expected and page through the data.

The idea is that we want to reduce the surprise for users, and that can give them the experience to draw upon early on. Another thing that we’ll do is make sure that the operations guys can also change that, likely with an environment variable or something like that. If you need to modify the conventions on the fly, you usually have hard time deploying a new version, and an immediate action is needed.

In this manner, we can help users avoid expensive requests to the server, and they can be explicit with what they need to do.

Tweet Share Share 18 comments

Tags:

Comments

28 Sep 2016
11:21 AM

Vlad

Silent swallowing of the rest of the data is what bothered me for a long time. I think exceptions is a much better solution here due to the fail fast principle.

28 Sep 2016
11:31 AM

Pop Catalin

“RavenDB should trust developers to know what they are doing” :) yeah about that ....

Still, I would very much prefer a model where RavenDB only throws in Debug mode, in Release mode and on production servers it should just log a Warning. But that's just my preference.

28 Sep 2016
11:42 AM

Oren Eini

Pop, There are problems here: a) No one ever looks at the logs. b) That is why you have the conventions for.

28 Sep 2016
11:55 AM

Pop Catalin

Ayende, it's true no one looks at logs. Period.

However when you present the information elsewhere, like a dashboard, (IE: "Performance Alerts (5)", with Red) then people start paying attention.

I know of an example where a company was gathering in excess of 300 GB of logs per week and they did nothing with them for over 6 years. One day when a simple dashboard was created with Top 5 slowest queries, Top 5 slowest pages. Top 5 most common errors, Pages taking over 5 seconds to load etc. people started jumping out of their chairs to fix the issues, even though they were there for years and no one bothered checking.

28 Sep 2016
11:57 AM

Oren Eini

Pop Catalin, We are actually doing that, but the problem is that those are stuff that is generated by the client, there isn't a dashboard that I can generate there.

28 Sep 2016
12:12 PM

Pop Catalin

So what's keeping sysadmins from setting the page limit 999.9999 records in order to eliminate the errors from production systems? That's what a typical sysadmin would do as soon as it receives complaints and finds the "solution" online.

28 Sep 2016
12:28 PM

Oren Eini

Pop Catalin, Absolutely nothing, but then they accept the responsibility for this setting.

28 Sep 2016
20:00 PM

Federico Lois

@Pop Catalin taking responsibility of it means that it will marked in red (this is freaking unsafe, use at your own risk kind of warning) and when a support engineer comes and looks at the unsafe endpoint because you are saying RavenDB is consuming X GB of memory he can say: "Solved" you have a very unsafe parameter there, it is working as designed.

28 Sep 2016
21:00 PM

njy

Pop Catalin, are you really saying you prefer auto-truncation of result sets (or, as Vlad calls it "silent swallowing") as the right thing to do, in production? Honest question

29 Sep 2016
07:17 AM

Pop Catalin

@njy, I prefer just a warning in production, and neither truncation nor raising errors is acceptable.

If there's an honest mistake somewhere, I prefer a spike in resource usage or a slowdown than let's say incomplete data leading to wrong results or service disruption due to errors. There are far worse things than a slow running query from a business perspective: a) not running at all, b) giving wrong results.

29 Sep 2016
12:59 PM

njy

Pop Catalin oh, I see, so you meant a warning without truncation, ok.

29 Sep 2016
14:08 PM

tobi

i just don't get why a fix should be to configure the server limit to a higher value. it's so easy to do paging in ravendb, all our webservices just provides paging, full stop! also on a ui level it just doesn't make sense to have too many items. no unbound.ToList() crap.

07 Oct 2016
15:37 PM

Mark Brackett

I think Pop Catalin hit the nail on the head - "There are far worse things than a slow running query from a business perspective: a) not running at all, b) giving wrong results."

Wrong results ends up costing (a) someone noticing it's wrong, (b) troubleshooting time and (c) fixing time. All on an emergency schedule.

Not running at all ends up costing (c) fixing time. Again, on an emergency schedule. Better, but you're still forcing emergencies.

A Debug.Assert and production logging seems the best compromise. Let customers fix their slow queries according to their own business needs, not your support costs.

07 Oct 2016
15:39 PM

Mark Brackett

Or just change the API, and require an explicit limit. Principle of Least Surprise and all that....

07 Oct 2016
16:46 PM

Oren Eini

Mark, A debug assert won't actually do much, and no one is looking at the logs.

Take a look at this issue: http://indexoutofrange.com/Debugging-high-memory-usage.Part-2-DotNetMemoryProfiler/

That is the kind of thing that takes a very long time to figure out, and it is a silly oversight that is causing this.

That is the whole reason why we have those limits in place.

And it isn't trying to save my support costs (take into account, if this was a support issue it would be trivial), I'm charging for support, so easy support issues are basically profit.

The reason to prevent it is to avoid the users getting into problem

07 Oct 2016
16:47 PM

Oren Eini

Mark, The most commonly used API is Linq, which doesn't require it. Also, the default approach by users it to put int.MaxValue

07 Oct 2016
19:51 PM

Mark Brackett

Apologies for going long, but this is actually something that's bothered me for quite awhile on RavenDB. I love the ideas (and engineering) behind it, but could never quite get on board with some of the more opinionated (and controversial) design choices.

For me, my relevant expectations for good .NET libraries are:

if it doesn't succeed completely and unambiguously, it throws an exception
if it works in dev, it works in production (absent major environment differences)
the only magic numbers are 0, 1 and many (ie., if it works with n items, it works with n + 1 items where n > 1)

Prior, unbounded results in RavenDB would break all three of those; the change to throwing an exception makes it better, but still breaks the latter two.

A debug assert won't actually do much, and no one is looking at the logs.

No one proactively looks at logs; but they do when troubleshooting. Or at least, they should - way before they start doing offline memory dumps, at least.

I see your point with the Debug.Assert; with the change to throw an exception at 25, that's no longer really relevant unless you also require an explicit limit (and then could be used for backwards compatibility).

The most commonly used API is Linq, which doesn't require it. Also, the default approach by users is to put int.MaxValue.

Unless you're aiming for portability, can't RavenDB require it regardless? I've never built a LINQ provider, but you obviously have enough info to know if a limit was supplied. Sure, it's a little leaky - but pretty much all the LINQ providers are (I mean, you're competing with EF...it's not like you can do much worse). Even for portability, an explicit flag to DisableUnboundedResultChecking for those porting apps would be explicit enough I think. As for int.MaxValue...just throw on that too. Hell, set a max size of 25 or whatever and throw on anything larger (regardless of actual result length). That again moves the problem to dev, where it's cheap to fix.

I haven't had this issue in production, but my thought process would be that the "fix" everyone will do is (1) increase the limit in config/code/whatever, (2) create a backlog item to do paging the right way, and (3) move on to work that adds business value. Maybe, just maybe, the business runs out of valuable ideas and the devs go back to actually working on that backlog item...but hopefully (for the sake of their jobs) not. As both a developer and a manager, I have to admit that's what I would probably do.

All that said - I applaud the good fight, and think the change to throwing an exception is a welcome one. I don't like the fact that it only occurs at the magic 25 boundary, but I've never been a big convention-over-configuration guy either. I prefer my limits to kick me in the face while writing code, not to wake me up at 2am wondering what magic config setting I need to tweak now and looking like an amateur for releasing code that's so fragile as to break at 26 records.

08 Oct 2016
07:41 AM

Oren Eini

Mark, For common queries:

 session.Query<Tag>().Where(x => x.Owner == "users/1")
  .ToList();

Requiring Take() will serious disrupt the workflow.

Moving this configuration to server side also give admins a "quick fix" and a devs a way to catch this early (set "implicit take limit at 1").

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB RetrospectiveUnbounded result sets

More posts in "RavenDB Retrospective" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "RavenDB Retrospective" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication