Production Postmortem: The Razor Suicide

architecture (624) rss
bugs (451) rss
community (383) rss
databases (481) rss
design (899) rss
development (659) rss
hibernating-practices (74) rss
miscellaneous (592) rss
performance (397) rss
programming (1114) rss
raven (1484) rss
ravendb.net (571) rss
reviews (184) rss

2026
- January (1)
2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Jan 27 2016

Production PostmortemThe Razor Suicide

time to read 5 min | 930 words

Unlike previous posts in this series, this is actually something that happened to our own production server today. It resulted in our website being inaccessible for a a couple of hours, and like most such stories, its root cause is tremedously simple, and through a series of unfortunate accidents, it had escalated to a major issue.

First, this post is dedicted to this book, which should be read by any self respecting developer whose code is expected to hit production.

This post is also written a few hours only after the incident was resolved. Before we actually implemented anything except a temporary workaround. I’ll probably have another post in a couple of days to talk about the steps we are going to take to alleviate a repeat of this incident.

The incident started innocently enough, when one of the guys on the team discovered that the startup time of a certain instance jumped by a lot. Investigating into why he realized that the issue was extremely slow responses from our server. That was a cause of triple concern, actually. First, why are we accepting such slow responses instead of time limiting them? Second, why are we making a remote syncronous call during startup? And third, why on Earth is our server so slow?

Logging into the server, it didn’t take long to see what the problem was. The www.RavenDB.net website (the code that runs the RavenDB website, not RavenDB itself) was consuming a lot of CPU and quite a bit of memory. In a bit to restore the other services which reside on the same box, we reset the process. Our main concern at the time was to restore service as soon as possible, and we planned on investigating further through the logs.

However, in a few minutes, the www.RavenDB.net website started consuming more and more resources. At that point, we started considering a DoS attach of some sort and looked a the logs. The logs did show a very high number of requests, much more than I would expect. But looking further into them, it looked like they were mostly bots of various kinds indexing our site.

Considering that this might be the case of Google hammering us, we configured a robots.txt on the site and waited to see if this would have an impact. It didn’t.

The next step was to take a process dump of the process, and then analyze it. During this period, we had to shut down www.RavenDB.net because it was killing all other services running on the server.

Looking at the dump in WinDBG, we started with the obvious commands.

!runaway – to find out the thread cpu times
switching to the busiest threads
!clrstack - to see what it is doing

Honestly, this is a much nicer way of looking at this, though:

As you can see, the threads are currently actually doing parsing of a Razor template, and seems to be doing that on a fairly continous basis, consuming all system resources.

At that point, I started getting concerned to the well being of the poor guy’s inbox as a result of this code. That was the point where we actually did what should have probably been our first action, and looked at the error log of the website.

Previously, we looked at the live metrics and the request log, but didn’t consider looking into the error log for the system. The error log for the website, for today only, was 6GB in size, and was pretty full of errors such as:

- error: (221, 88) 'HibernatingRhinos.Orders.Common.EmailProcessing.EmailTemplates.RavenDBWebsite.Models.DownloadQuestionMailInput' does not contain a definition for 'Unsubscribe' and no extension method 'Unsubscribe' accepting a first argument of type

And at that point, we had enough to point a suspicious finger. We have an email that we send out, and we used to have a valid template. At some point, the code was changed and the Unsubscribe was removed. Nothing broke because the template is just a file, not actually compiling code. However, in production, when we tried to send the email, Razor would parse the text, fail compilation because of the missing member, and basically thorw a hissy fit.

Update: We investigated further, and it looks like the following was the actual “solution” to the outage:

The “solution” is in quotes, becasue this fixes the problem, but we need to still implement steps to ensure that something like that doesn’t repeat.

Unforuntately, at that point, we would consider this email as failing, and move on to the next one. That next one would also fail, and so would the next one, etc. Because all of them failed, they would get picked up again next time this run.

Once we knew where the problem was. The workaround was to deploy a version with no email sending. For this weekend, that will work. But come Sunday, someone is going to go over this piece of code with a veyr fine comb. I’ll post more about it once this actually roll around.

Tweet Share Share 11 comments

Tags:

development

Comments

27 Jan 2016
11:29 AM

Vasili

We've had a lot of similar issues with Razor Engine as well as some other hard to diagnose. As a result I decided instead of using runtime template parsing to generate everything upfront at the moment when one edits an email template. This solution won't work if you support updating email templates at run time. But if you don't support that and create the templates upfront then RazorEngine.Generator ( https://visualstudiogallery.msdn.microsoft.com/4cebc8ac-50dc-4381-8f2a-634f318f06df ) will help, it's a VS extension and after you install it you'll need to set custom tool for your razor files to RazorEngineGenerator.

27 Jan 2016
11:54 AM

David Keaveny

With vanilla ASP.NET MVC, you could catch those errors by using ASP.NET pre-compilation in your build process (or Resharper while still in dev). Since you are using Razor Engine, I am keen to see how you fix it properly, especially as I am also using Razor Engine for email generation, albeit in a Windows service which won't flatten the website when it goes titsup.

27 Jan 2016
12:07 PM

Catalin Pop

The first thoughts when I read this is that this is an Razor bashing post ...

What you described, is not by any means a Razor issue. Razor behaves very well here, it' can't parse, it gives and error, as specific as possible. It's not even an ASP.Net issue, the fact the Razor views are just in time compiled on access, is a well known behavior. The fact that the Application Email Sender uses Dynamically compiled razor views should have been well known as well.

It is more of an technical implementation issue.

Visual Studio and tools (Resharper) all have support for refactoring razor views and compiling them. The fact that the solution was structured in such a way that model renaming was not detected during a release build (if not a debug one) kind of defeats the purpose of using a Strong Typed View Engine imho.

Another more important issue is the fact that there is an infinite retry in place. Infinite retries is an disaster waiting to happen in any application/system, and among the biggest anti patterns there are: From installers that infinitely retry to access files locked by other installers that wait for the firs ones to unlock some other files, to unending reboot cycles, to applications that infinitely try save a file, and others ...

People say multi threading is hard .... it's way easier than proper error handling ...

27 Jan 2016
12:29 PM

mark

Nice post. I'd also say that the fact that you were not alerted to production errors is also a problem.

27 Jan 2016
13:02 PM

Anders Feldt

Don't you have any kind of tests on the code that generates the content of the mail body? That would have found the compile error in the razor view before it reached the production.

27 Jan 2016
15:04 PM

I would separate the process for sending emails from the website. One idea: introduce an asynchronous queue. Whenever you need to send an email, put a message on a queue somewhere. A listener on the queue would process the emails asynchronously and independently from any other process. This can be on a separate server and independently scaled. After a few retries, a failed message should go to a dead letter queue. You can monitor both queues in various ways. If anything goes wrong with sending emails, your main website is unaffected.

27 Jan 2016
21:11 PM

jte

May this be the solution to ensure this does not repeat ?

http://stackoverflow.com/questions/20156472/how-can-i-make-builds-fail-in-vs-2013-if-theres-a-razor-c-error-in-a-cshtml-f

27 Jan 2016
21:29 PM

Anon

This project can help to find razor's problems at compile time https://github.com/RazorGenerator/RazorGenerator

27 Jan 2016
21:58 PM

Thomas Lauzi

I have to agree with Marc.

The fact that you were not alerted to production errors is IMHO the real problem. Everything can happen and the Razor suicide is only one possible error. Errors always happen, and you´ll never catch all (so may tests you cannot write), but the most important thing is, to get notified, when something goes really wrong. We scan our logs files or database tables with logs for errors regulary (batch/powershell/sql,.. scripts), and send simple status emails to an internal log email address. One person has to check them in the morning (during the cup of coffee) I can recommend this.

28 Jan 2016
13:04 PM

Tim

<MvcBuildViews>true</MvcBuildViews> http://haacked.com/archive/2011/05/09/compiling-mvc-views-in-a-build-environment.aspx/

I agree with JTE. I wish there were a UI for this, and I continually forget to set it for new projects since it defaults to false, but it can be a lifesaver.

03 Feb 2016
16:50 PM

Pedro

A tool that my team and I use to prevent such scenarios is Resharper.

So first, we would see if the class had any uses for that Property and then Remove it. OR you could just check for Errors in the solution - even non-compiling errors like that would appear in ReSharper.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Production PostmortemThe Razor Suicide

More posts in "Production Postmortem" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Production Postmortem" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication