Feature Rejection: sending an email alert from RavenDB
We got a feature request that we don’t intend to implement, but I thought the reasoning is interesting enough for a blog post. The feature request:
If there is a critical error or major issue with the current state of the database, for instance when the data is not replicated from Node C to Node A due to some errors in the database or network it should send out mail to the administrator to investigate on the issue. Another example is, if the database not active due to some errors then it should send out mail as well.
On its face, the request is very reasonable. If there is an error, we want to let the administrator know about it, not hide it in some log file. Indeed, RavenDB has the concept of alerts just for that reason, to surface any issues directly to the admin ahead of time. We also have a mechanism in place to allow for alerts for the admin without checking in with the RavenDB Studio manually: SNMP. The Simple Network Monitoring Protocol is designed specifically to enable this kind of monitoring and RavenDB expose a lot of state via that which you can act upon in your monitoring system.
Inside your monitoring system, you can define rules that will alert you. Send an SMS if the disk space is low, or email on an alert from RavenDB, etc. The idea of actively alerting the administrator is something that you absolutely want to have.
Having RavenDB send those emails, not so much. RavenDB expose monitoring endpoint and alerts, it doesn’t act or report on them. That is the role of your actual monitoring system. You can setup Zabbix or talk to your Ops team which likely already have one installed.
Let’s talk about the reason that RavenDB isn’t a monitoring system.
Sending email is actually really hard. What sort of email provider do you use? What options are required to set it up a connection? Do you need X509 certificate or user/pass combo? What happens if we can’t send the email? That is leaving aside the fact that actually getting the email delivered is hard enough. Spam, SPF, DKIM and DMARC is where things start. In short, that is a lot of complications that we’ll have to deal with.
For that matter, what about SMS integration? Surely that would also help. But no one uses SMS today, we want WhatsApp integration, and Telegram, and … You go the point.
Then there are social issues. How will we decide if we need to send an email or not? There should be some policy, and ways to configure that. If we won’t have that, we’ll end up sending either too many emails (which will get flagged / ignored) or too few (why aren’t you telling me about XYZ issue?).
A monitoring system is built to handle those sort of issues, it is able to aggregate reports and give you a single email with the current status, open issues for you to fix and do a whole lot more that is simply outside the purview or RavenDB. There is also the most critical alert of all, if RavenDB is down, it will not be able report that it is down because it is down.
The proper way to handle this is to setup integration with a monitoring system, so we’ll not be implementing this feature request.
Comments
We have a system at work which we extended to send mails on critical errors via a log4net mail sender and we managed to send soooo many mails because of a repeated error that someone from our mail team called me (My name was in the from field). ... oops :-)
it is already some years in the past.
Another lesser reason not to do this feature is to prevent a false sense of security.
If Node C can't replicate to Node A there might be a network error which also prevents the email from going out. So now you need an external system to check if Node C can be reached. But that's not enough, since the fact that you can connect to Node C does not mean Node C can send emails. So you would need the external system to check if the email subsystem works. At which point you can simply cut out the middle man and simply check RavenDB itself.
Patrick,
That is a good point, yes. You can't monitor yourself, because if you are down, you can't tell that you are down :-)
First thing coming inmy mind is SRP. In short monitoring features is out of scope of db. RavenDB should just provide monitoring capabilities.
Comment preview