The recording is now available…
It should come as no surprise that our entire internal infrastructure is running on RavenDB. I wholly believe in the concept of dog fooding and it has serve us very well over the years.
I was speaking to a colleague just now and it occurred to me that it is surprising that we do certain things wrong, intentionally. It is fair to say that we know what the best practices for using RavenDB are, the things that you can do to get the most out of it.
In some of our internal systems, we are doing things in exactly the wrong way. We are doing things that are inefficient in RavenDB. We take the expedient route to implement things. A good example of that is that we have a set of documents that can grow to be multiple MB in size. They are also some of the most common changed documents in the system. Properly design would call to break them apart to make things easier for RavenDB.
We intentionally modeled things this way. Well, I gave the modeling task to an intern with no knowledge of RavenDB and then I made things worse for RavenDB in a few cases where he didn’t get it out of shape enough for my needs.
Huh?! I can hear you thinking. Why on earth would we do something like that?
We do this because if serves as an excellent proving ground for misuse of RavenDB. It show us how the system behave under non ideal situations. Not just when the user is able to match everything to the way RavenDB would like things to be, but how they are likely to build their system. Unaware of what is going on behind the scenes and what the optimal solution would be. We want RavenDB to be able to handle that scenario well.
An example that pops to mind was having all the uploads on the system be attachments on a single document. That surfaced that we had a O(N^2) algorithm very deep in the bowels of RavenDB for placing a new attachment. It would be completely invisible under normal case, because it was fast enough under any normal or abnormal situation that we could think of. But when we started getting high latency from uploads, we realized that adding the 100,002th attachment to a document required us to scan through the whole list… it was obvious that we needed a fix. (And please, don’t put hundreds of thousands of attachments on a document, it will work (and it is fast now), but it isn’t nice).
Doing the wrong thing on purpose means that we can be sure that when users are doing the wrong thing accidently, they get good behavior.
I have wrote an article about the uses of the profiler and the benefits it brings. And here is the 30 seconds video:
You can watch the full demo in the Cosmos DB webinar.
The recording is now available here:
We got a feature request that we don’t intend to implement, but I thought the reasoning is interesting enough for a blog post. The feature request:
If there is a critical error or major issue with the current state of the database, for instance when the data is not replicated from Node C to Node A due to some errors in the database or network it should send out mail to the administrator to investigate on the issue. Another example is, if the database not active due to some errors then it should send out mail as well.
On its face, the request is very reasonable. If there is an error, we want to let the administrator know about it, not hide it in some log file. Indeed, RavenDB has the concept of alerts just for that reason, to surface any issues directly to the admin ahead of time. We also have a mechanism in place to allow for alerts for the admin without checking in with the RavenDB Studio manually: SNMP. The Simple Network Monitoring Protocol is designed specifically to enable this kind of monitoring and RavenDB expose a lot of state via that which you can act upon in your monitoring system.
Inside your monitoring system, you can define rules that will alert you. Send an SMS if the disk space is low, or email on an alert from RavenDB, etc. The idea of actively alerting the administrator is something that you absolutely want to have.
Having RavenDB send those emails, not so much. RavenDB expose monitoring endpoint and alerts, it doesn’t act or report on them. That is the role of your actual monitoring system. You can setup Zabbix or talk to your Ops team which likely already have one installed.
Let’s talk about the reason that RavenDB isn’t a monitoring system.
Sending email is actually really hard. What sort of email provider do you use? What options are required to set it up a connection? Do you need X509 certificate or user/pass combo? What happens if we can’t send the email? That is leaving aside the fact that actually getting the email delivered is hard enough. Spam, SPF, DKIM and DMARC is where things start. In short, that is a lot of complications that we’ll have to deal with.
For that matter, what about SMS integration? Surely that would also help. But no one uses SMS today, we want WhatsApp integration, and Telegram, and … You go the point.
Then there are social issues. How will we decide if we need to send an email or not? There should be some policy, and ways to configure that. If we won’t have that, we’ll end up sending either too many emails (which will get flagged / ignored) or too few (why aren’t you telling me about XYZ issue?).
A monitoring system is built to handle those sort of issues, it is able to aggregate reports and give you a single email with the current status, open issues for you to fix and do a whole lot more that is simply outside the purview or RavenDB. There is also the most critical alert of all, if RavenDB is down, it will not be able report that it is down because it is down.
The proper way to handle this is to setup integration with a monitoring system, so we’ll not be implementing this feature request.