Production postmortem: The case of the man in the middle

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Think inside the database - RavenDB with native GenAI integration

Aug 14 2015

Production postmortemThe case of the man in the middle

time to read 3 min | 553 words

One of the most frustrating things when you dealing with production issues is when the problem is not in our product, but elsewhere. In particular, this post is dedicated to the hard work done by many anti virus products, in particular, to make our life harder.

Let us take a look at the following quote, taken from the ESET NOD32 Anti Virus knowledge base (emphasis mine):

By default, your ESET product automatically detects programs that are used as web browsers and email clients, and adds them to the list of programs that the internal proxy scans. This can cause loss of internet connectivity or other undesired results with applications that use network features but are not web browsers/email clients.

Yes, it can. In fact, it very often does.

Previously, we looked at a similar issue with Anti Virus slowing down I/O enough to cause us to slowly die. But in this case, the issue is a lot more subtle.

Because it is doing content filtering, it tends to put a much higher overhead on the system resources, which means that as far as the user is concerned, RavenDB is slow. We actually developed features specifically to handle this scenario. The traffic watch mode will tell you how much time you spend on the server side, and we have added a feature that will make RavenDB account for the internal work each query is doing, so we can tell where the actual cost is.

You can enable that by issuing:

GET databases/Northwind/debug/enable-query-timing

And one that is setup, you can get a good idea about what is costly in the query, as far as RavenDB is concerned. Here is an example of a very slow query:

You can see that the issue is that we are issuing a very wide range query, so most of the time is spent in inside Lucene. Other examples might be ridicilously complex queries, which result in high parsing time (we have seen queries in the hundreds of KB range). Or loading a lot of big documents, or… you get the drift. If we see that the server thinks that a query is fast, but the overall time is slow, we know to blame the network.

But an even more insidious issue is that this would drop requests, consistently and randomly (and yes, I know that those are contradictions, it was consistently dropping requests in a random pattern that seemed explicitly designed to thwart figuring out what is going on). Leading to things breaking, and escalated support calls. “RavenDB is broken” leads to a lot of headache, and a burning desire to hit something when you figure out that not only isn’t it your fault, but the underlying reason is actively trying to prevent you from figuring it out (I assume it is to deal with viruses that try to shut it off), which lead to really complex find facting sessions.

That is more annoying because it seems that the issue there was a bug in respecting keep alive sessions for authenticated requests under some scenarios, in the AV product in question! Absolutely not fun!

Tweet Share Share 18 comments

Tags:

Comments

14 Aug 2015
10:41 AM

orbitz

I'm confused by the anti-virus related posts. Why does someone run anti-virus software on server infrastructure? In the *NIX world, what gets installed on servers is generally heavily regulated via package and configuration managers.

14 Aug 2015
11:18 AM

mark

Maybe you could add a "RavenPing" feature where you fire 1M nop requests to the server and measure timing profile and loss. Or, send one ping every second forever and leave it running for a day. This might be very handy.

Maybe you can even send suspicious content to make the AV bite. Include some dirty words or fake credit card numbers.

14 Aug 2015
12:14 PM

Robert Labrie

Hi Oren, thanks for the deep dive, looking forward to (some day) upgrading so that I can leverage these timing features.

Remember that not all anti-virus products are created equal, and that all require tuning. You run anti-virus on server infrastructure so that when your public tier, or the workstation of some user with elevated privileges is compromised, you've got some protection on your critical infrastructure. Malicious software doesn't ask politely, it exploits un-patched (possibly previously unknown) vulnerabilities to install itself, and even if it takes days for the definitions to get caught up, it's great to have AV come along and remove it. In the Windows world, what gets installed on servers is generally heavily regulated by fine grained access control, group policy objects, and automated software deployment tools controlled by policies and templates. It's not an accident that my servers run AV, and in fact, I find it rather alarming that someone wouldn't. Microsoft best practices tell you to run it on Domain Controllers, Exchange servers and SQL servers; with the caveat that you tune it so that it doesn't interfere with the applications the server is hosting.

14 Aug 2015
14:52 PM

Alois Kraus

How did you find it this time? With ETW or by shutting down the AV solution?

14 Aug 2015
19:59 PM

Oren Eini

Orbitz, Some of those were actually encountered on developers machine, where they are testing things out. But I usually think about it like this: http://imgs.xkcd.com/comics/voting_machines.png

This is a blog post (admittedly from an AV company) that discuss this topic: https://blogs.sophos.com/2013/12/09/do-you-need-antivirus-on-linux-servers/

And from Microsoft: https://support.microsoft.com/en-us/kb/309422

A lot of that depend on the position of the server, if it is externally accessible, etc.

Honestly, a lot of the time that is done to CYA.

See a great example of the details why here: http://serverfault.com/questions/643099/run-antivirus-software-on-linux-dns-servers-does-it-make-sense?answertab=votes#tab-top

14 Aug 2015
20:00 PM

Oren Eini

Mark, RavenPing would work just fine, it would be a small request and not trigger additional work. It is when you send big request / responses that you see those.

14 Aug 2015
20:06 PM

Oren Eini

Robert, I don't have a problem with running AV on servers. I have big problems with running AV on servers, not excluding the database, then opening critical support calls for performance / stability issues. That happens all too often, I'm afraid.

Just to give you some idea, here is the MS recommended practice for running AV on SQL server machine: https://support.microsoft.com/en-us/kb/309422

That is ten pages of text to explain that. Conscientious admins would follow that, but a lot of the time, this is ignored. And there are problems

14 Aug 2015
20:06 PM

Oren Eini

Alois, The first thing we do when we find an AV solution is to ask to shut it down, then see if the problem goes away :-)

14 Aug 2015
22:56 PM

njy

@oren: the GET request to enable the option was a typo, right? I mean, it is a POST request, correct?

15 Aug 2015
00:04 AM

Oren Eini

njy, No, this is a POST. We want to make it easy to enable that via the browser directly

16 Aug 2015
19:02 PM

njy

I highly suggest to avoid changing a system's state with a GET request. I mean, standard compliance, security (CSRF just to name one) and some other reasons http://stackoverflow.com/questions/705782/why-shouldnt-data-be-modified-on-an-http-get-request

16 Aug 2015
19:11 PM

Oren Eini

njy, That isn't changing the system state. It modify no behavior. It simple cause us to track additional data for a period of time. Note that pretty much everything in your link refers to standard web apps, RavenDB isn't such.

16 Aug 2015
20:28 PM

njy

@Oren: A couple of point: - you said "It cause us to track additional data for a period of time" and that means flipping a switch and that is state change, even if a "lite" one; - you said "We want to make it easy to enable that via the browser directly" and that means someone using a browser, and that is basically a webapp (behaviour). I mean, if a user can open a browser and do a GET on a url, that can be done via a hidden iframe or an img's src or something else; - one of the point was standard compliance, which seems pretty important; - i personally have never ever saw anywhere a command for a webapp/webservice/webwhatever launched via a GET request; Cheers.

16 Aug 2015
21:22 PM

mark

Next idea: Make Raven be able to create a "system summary" consisting of number of CPUs, RAM etc. and presence of AV. Ask customers to send in that string with every support call. Maybe you can even make the raven server interact with the major AV vendors and check their configuration at runtime to make sure raven is excluded. Otherwise, make the dashboard issue a warning.

16 Aug 2015
22:32 PM

Oren Eini

Mark, We actually do that, we have a Gather Debug Info which collects all major data points we need. AV is a problem, because there is no standard way to detect it (especially on server OS).

16 Aug 2015
22:33 PM

Oren Eini

Njy, If you have a user that can use a hidden frame to go to RavenDB, you have other issues.

We have a whole set of commands that are used for information gathering which are often invoked from production machines, where the ability to invoke REST commands (except via browser GET) is limited. Yes, this is stupid. Yes, there is command line, but we found that it creates a big barrier to solving problems, rather than just sending URLs over the browser.

And again, RavenDB is not exposed to users in any way. This isn't a web ap

22 Aug 2015
03:56 AM

Ivan

I am with njy on this one (about GET request). I was equally surprised. If the request has side effects like "cause us to track additional data", it means it IS changing some state. GET requests should not do that. Think of a search engine accidentally finding a link to your GET request and visiting it. Normal search engines will never do POSTs, but GETs are fair game.

23 Aug 2015
08:08 AM

Oren Eini

Ivan, A search engine shouldn't have access to ravendb. It is a db, not a web app that is exposed to the world

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Production postmortemThe case of the man in the middle

More posts in "Production postmortem" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Production postmortem" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication