The metrics calculation methods

time to read 3 min | 418 words

Any self respecting database needs to be able to provide a whole host of metrics for the user.

Let us talk about something simple, like the requests / second metrics. This seems like a pretty easy metric to have, right? Every second, you have N number of requests, and you just show that.

But it turns out that just showing the latest req/sec number isn’t very useful, primarily because a lot of traffic actually have a valleys & peaks. So you want to have the req/sec not for a specific second, but for some time ago (like the req/sec over the last minute & 15 minutes).

One way to do that is to use an exponentially-weighted moving average. You can read about their use in Unix in these articles. But the idea is that as we add samples, we’ll put more weight on the recent samples, but also take into account historical data.

That has the nice property that it reacts quickly to changes in behavior, but it smooth them out that you see a gradual change over time. The bad thing about it is that it is not accurate (in the sense that this isn’t very easy for us to correlate to exact numbers) and it is smooth out changes.

On the other hand, you can take exact metrics. Going back to the req/sec number, we can allocate an array of 900 longs (so enough for 15 minutes with one measurement per second) and just use this cyclic buffer to store the details. The good thing about that is that it is very accurate, we can easily correlate results to external numbers (such as the results of a benchmark).

With the exact metrics, we get the benefit of being able to get the per second data and look at peaks & valleys and measure them. With exponentially weighted moving average, we have a more immediate response to changes, but it is never actually accurate.

It is a bit more work, but it is much more understandable code. On the other hand, it can result in strangeness. If you have a a burst of traffic, let’s say 1000 requests over 3 seconds, then the average req/sec over the last minute will stay fixed at 50 req/sec for a whole minute. Which is utterly correct and completely misleading.

I’m not sure how to handle this specific scenario in a way that is both accurate and expected by the user.

Tweet Share Share 14 comments

Tags:

Comments

23 Jan 2017
10:26 AM

Pop Catalin

how about calculating the last minute average like: LastMinuteAvg = (TotalReq = TotalReq- Req61SecondsAgo + CurrentReq) / 60

Whenever a second passes you subtract the request count from 61 seconds (that leaves the window) ago from the total and add the current request count (which enters the window), then perform the average. This way it's a average for the window of the 60 last seconds.

23 Jan 2017
10:59 AM

Oren Eini

Pop Catalin, We already have the number of requests per minutes available (since we have the number of req per second in that time frame). We don't want the req per minute, we want the req per second in the last minute

23 Jan 2017
11:02 AM

Pop Catalin

"we want the req per second in the last minute"

Isn't that a rolling window with the average number of requests in the past 60 seconds from current time?

23 Jan 2017
11:05 AM

Oren Eini

Pop Catalin,

Request per minute don't really indicate actual load on the server, req/sec is much more accurate, but you want to see if over time ranges, that is the problem

23 Jan 2017
11:21 AM

Pop Catalin

"but you want to see if over time ranges, that is the problem" You want to see it Aggregated (Avg, Max, Sum) ? or as series (graph)?

23 Jan 2017
12:27 PM

Rafal

Provide several measures - total requests in last second, last 10 seconds, last 5 minutes, then users choose what they want to see. Usually data collection tools record a sample every 5 minutes or so so 5 minute resolution accomodates them well. and 1 or 10 second resolution is good for watching the status online. Further aggregation can be done by the monitoring tool itself.

23 Jan 2017
13:06 PM

alex

You could track a rolling average with its standard deviation and additionally record "peaks" and "cliffs" over some time period. The peaks and cliffs signifying loads outside of the average ± n SD. Then next to the average and SD you could present the number of peaks/cliffs (or their ratio to "in-band" loads) and their most extreme values.

Also, or alternatively you could make use of something like a simple process control statistic / chart (e.g CUSUM or Shewart) to detect and visualize whether there are certain "out-of-control" trends.

23 Jan 2017
16:54 PM

Albert Hoekstra

since you want to know server load per second I'm thinking about the most accurate values. That will be saving every request, lots of data of course, but you could just save all incoming request and insert into database every x records or after a certain time.

When you got the most accurate data you can calculate the requests per second every minute or 2.

Think about how azure does it, they show the data after a certain amount off time, so they can do some calculations too.

24 Jan 2017
08:20 AM

Aristarkh Zagorodnikov

I'd go for per-interval (second?) histogram (HdrHistogram?) approach, since you can extract almost anything from histograms. They are of course larger than a single [long] integer, but having detailed information is usually well worth the effort when trying to meet some kind of an internal SLA (like "no request should be slower than 50ms").

24 Jan 2017
08:22 AM

kpvleeuwen

You want to show an average request rate and a current request rate. That's two metrics. One number just can't combine these two properties in a meaningful way, for the reasons you state. Why is your last example misleading? What number would you like to see there?

24 Jan 2017
16:25 PM

Piers Lawson

Have you considered percentiles? 95th, 50th (median) and 5th are pretty standard. The spacing between them gives an indication of how "choppy" the underlying data is without the peak value blowing the chart scales... or you plot peak using a different scale.

This reminds me of a post of yours from a few years back

https://ayende.com/blog/162273/raven-xyz-trying-out-some-ideas

Are they related? Did anything come of that thought?

24 Jan 2017
18:52 PM

Oren Eini

Piers, No, this is about metrics for the req/sec on RavenDB.

24 Jan 2017
21:09 PM

Sotirios Mantziaris

Have you ever thought to push or pull these "metric" out of process and let someone else do the actual math. A lot of system provide such events to other 3rd party systems. For example Etcd,Kubernetes,SkyDNS provide stats for prometheus by default. Due to the nature of your process this has to be super optimized of course...

24 Jan 2017
21:23 PM

Oren Eini

Sotirios, We also do that (via SNMP), but the idea is that we also want to have some basic stats available in the product.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB