The Guts n’ Glory of Database Internals: The curse of old age…

architecture (615) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1087) rss
raven (1456) rss
ravendb.net (540) rss
reviews (184) rss

2025
- July (6)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Think inside the database - RavenDB with native GenAI integration

Jul 15 2016

The Guts n’ Glory of Database InternalsThe curse of old age…

time to read 5 min | 922 words

This is a fun series to write, but I’m running out of topics where I can speak about the details at a high level without getting into nitty gritty details that will make no sense to anyone but database geeks. If you have any suggestions for additional topics, I would love to hear about them.

This post, however, is about another aspect of running a database engine. It is all about knowing what is actually going on in the system. A typical web application has very little state (maybe some caches, but that is pretty much about it) and can be fairly easily restarted if you run into some issue (memory leak, fragmentation, etc) to recover most problems while you investigate exactly what is going on. A surprising number of production systems actually have this feature that they just restart on a regular basis, for example. IIS will restart a web application every 29 hours, for example, and I have seen production deployment of serious software where the team was just unaware of that. It did manage to reduce a lot of the complexity, because the application never got around to live long enough to actually carry around that much garbage.

A database tend to be different. A database engine lives for a very long time, typically weeks, months or years, and it is pretty bad when it goes down, it isn’t a single node in the farm that is temporarily missing or slow while it is filling the cache, this is the entire system being down without anything that you can do about it (note, I’m talking about single node systems here, distributed systems has high availability systems that I’m ignoring at the moment). That tend to give you a very different perspective on how you work.

For example, if you are using are using Cassandra, it (at least used to) have an issue with memory fragmentation over time. It would still have a lot of available memory, but certain access pattern would chop that off into smaller and smaller slices, until just managing the memory at the JVM level caused issues. In practice, this can cause very long GC pauses (multiple minutes). And before you think that this is an situation unique to managed databases, Redis is known to suffer from fragmentation as well, which can lead to higher memory usage (and even kill the process, eventually) for pretty much the same reason.

Databases can’t really afford to use common allocation patterns (so no malloc / free or the equivalent) because they tend to hold on to memory for a lot longer, and their memory consumption is almost always dictated by the client. In other words, saving increasing large record will likely cause memory fragmentation, which I can then utilize further by doing another round of memory allocations, slightly larger than the round before (forcing even more fragmentation, etc). Most databases use dedicated allocators (typically some from of arena allocators) with limits that allows them to have better control of that and mitigate that issue. (For example, by blowing the entire arena on failure and allocating a new one, which doesn’t have any fragmentation).

But you actually need to build this kind of thing directly into the engine, and you need to also account for that. When you have a customer calling with “why is the memory usage going up”, you need to have some way to inspect this and figure out what to do about that. Note that we aren’t talking about memory leaks, we are talking about when everything works properly, just not in the ideal manner.

Memory is just one aspect of that, if one that is easy to look at. Other things that you need to watch for is anything that has a linear cost proportional to your runtime. For example, if you have a large LRU cache, you need to make sure that after a couple of months of running, pruning that cache isn’t going to be an O(N) job running every 5 minutes, never finding anything to prune, but costing a lot of CPU time. The number of file handles is also a good indication of a problem in some cases, some databases engines have a lot of files open (typically LSM ones), and they can accumulate over time until the server is running out of those.

Part of the job of the database engine is to consider not only what is going on now, but how to deal with (sometimes literally) abusive clients that try to do very strange things, and how to manage to handle them. In one particular case, a customer was using a feature that was designed to have a maximum of a few dozen entries in a particular query to pass 70,000+ entries. The amazing thing that this worked, but as you can imagine, all sort of assumptions internal to the that features were very viciously violated, requiring us to consider whatever to have a hard limit on this feature, so it is within its design specs or try to see if we can redesign the entire thing so it can handle this kind of load.

And the most “fun” is when those sort of bugs are only present after a couple of weeks of harsh production systems running. So even when you know what is causing this, actually reproducing the scenario (you need memory fragmented in a certain way, and a certain number of cache entries, and the application requesting a certain load factor) can be incredibly hard.

Tweet Share Share 15 comments

Tags:

Comments

15 Jul 2016
15:13 PM

peter

I would actually really like to see your take on the nitty-gritty details. You already have regular visitors who will understand those posts, even if I may not. BTW one of ways your blog stands out is due to the regularity/frequency of your posts, besides the value of the posts themselves.

15 Jul 2016
17:25 PM

Oren Eini

Peter, Thanks, the problem is that I tend to either paint with a broad stroke, such as this series, where I explain what is going on and the full reasoning, or we have super specific stuff (like the cost of transaction logs).

It is hard for me to find topic in the middle, because I'm not usually working at that level. If you have suggestions for topics, I would love that.

15 Jul 2016
18:32 PM

Chris B

What about covering locking (all the different kinds of locks and escalation)? There is probably enough there that you could write a whole series of posts if you wanted to.

15 Jul 2016
19:30 PM

Lucas Vogel

I happily concur with what Peter said. Your blog for years has been a beacon of incredibly useful and valuable information.

If you're looking for post ideas - have you done anything on packet-level security? For example, if you have two disparate systems passing sensitive data back and forth via TCP/IP, what are some good approaches for keeping the data secure? You may have already covered this, but I couldn't think of anything else at the moment.

15 Jul 2016
20:17 PM

Stuart Bale

I'd like to understand the process you go through to determine the high level client API. For example, why do you create a connection and then a session within that? Why do you set limits around the number of records returned from a query? How do you decide on the naming convention to use?

15 Jul 2016
21:10 PM

Oren Eini

Chris B, Did you see this post about locking: https://ayende.com/blog/174562/the-guts-n-glory-of-database-internals-managing-concurrency

The problem with talking about locking at this level is that this really depend on the actual db you are using. In relational database, you have row locks, page locks, etc. In NoSQL db you have (maybe) records locks and typically a writer lock. Very little to talk about

15 Jul 2016
21:12 PM

Oren Eini

Lucas, While that is an interesting topic, that is quite outside of my own area of expertise. I know enough about security to know that I don't know enough about security.

My probable recommendation is to put SslStream around the connection, verify the certificate on both ends, and assume that the SSL implementation that I'm using is safe and secure enough.

Beyond that, I would go to an expert in that particular field

15 Jul 2016
21:13 PM

Oren Eini

Stuart, Thanks, I'll think about how to answer those questions as a full post

15 Jul 2016
21:27 PM

alex

An example for a topic "in the middle", that relates to this one, would be to run through a scenario in which a number of concurrent transactions (e.g. 2 read-only with one that is "slow", and 2 read-write) are performed and how this is handled in Voron in terms of allocation/use/freeing of memory and files. I.e. journal files being created, filled and recycled, scratch memory pages being allocated / freed, MVCC finding pages from scratch buffers or main data file, writing to and growing the main data file, etc.

This could complement and to some extent tie together a recent set of blog posts on how "logical" pages and their contents are found using B+Trees, the "copy-on-write" MVCC snapshotting mechanism, the recent "durability" series and this one about management of allocated memory and file handles.

A similar alternative scenario could be to run through the steps occurring (again referring to journal files, scratch pages, data sync checkpoint, main data files, ...) when Voron recovers after a crash.

15 Jul 2016
22:11 PM

Oren Eini

Stuart, The high level client API started out by looking at both NHibernate session API and the Entity Framework API. The idea is to give users something that they are familiar with, with very little surprises.

The naming conventions was the standard .NET one, again, for fewer surprises.

The limitations on unbounded result set and other stuff like that mostly came out of reading Release It! and from actual experience in seeing application fail to perform (or just fail) as a result of those common mistakes. One of the things that we tried to do is to ensure that we don't have those hard pitfalls in RavenDB

16 Jul 2016
14:37 PM

Carsten Hansen

What about writing about arena memory management? You do not define the word Arena in the text even if you write for a broader audience.

Database defragmentation could be another subject.

Database statistics for building index.

Another memory leak is stack overflow due to recursive calls - like Toyota Camry: http://embeddedgurus.com/state-space/2014/02/are-we-shooting-ourselves-in-the-foot-with-stack-overflow/

16 Jul 2016
20:19 PM

Oren Eini

Alex, Thanks, lots of good ideas here. I added a new series here: https://ayende.com/blog/posts/series/

It is a bit far into the queue, though.

16 Jul 2016
20:22 PM

Oren Eini

Carsten,

Arena allocation is actually explained pretty nicely in Wikipedia, I'm not seeing much to add there.

DB Defrag - You mean how does the database handles reducing the file size after mass deletes?

What do you mean by db stats for index building? In RavenDB? In General?

That issue is related to safety critical code. In those cases, you don't even consider such things as managed language. You use C / Ada and has a very little freedom (no memory allocations, no recursion, no function pointers, no double pointers, etc)

17 Jul 2016
05:52 AM

Carsten Hansen

Yes, I found information about Arena when reading your article: https://en.wikipedia.org/wiki/Region-based_memory_management

You could have provided the link or a short explanation.

How is it used in RavenDB? Do you have your own UNSAFE memory management with custom new/delete of different block size? https://ayende.com/blog/173089/measuring-baseline-costs

DB Defrag, yes auto skrink. I know MS Exchange does online during the night and you have offline as well: https://support.microsoft.com/en-us/kb/328804

Both in general and in RavenDB. I know from SQL that the stats need to be updated to make the query optimizer work.

The issue is that you can compare a DB engine with embedded software because it has to run 24x7 for years.

NB: Query optimizer is another subject which you can write about.

18 Jul 2016
12:15 PM

Oren Eini

Carsten, Arena allocator is covered there pretty well. We allocate a bunch of memory and release it at once, I don't think we do anything special there, but yes, we do use native memory for that.

I'll do a post about defrag, yes.

Query optimizer is relational and RavenDB are very different, and statistics, for example, play no role in RavenDB.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

The Guts n’ Glory of Database InternalsThe curse of old age…

More posts in "The Guts n’ Glory of Database Internals" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "The Guts n’ Glory of Database Internals" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication