reRavenDB. Two years of pain and joy
I was pointed to this blog post which talks about the experience of using RavenDB in production. I want to start by saying that I love getting such feedback from our users, for a whole lot of reasons, not the least of which is that it is great to hear what people are doing with our database.
Alex has been using RavenDB for a while, so he had the chance to use RavenDB 3.5 and 4.2, that is a good from my perspective, because means that he had the chance to see what the changes were and see how they impacted his routine usage of RavenDB. I’m going to call out (and discuss) some of the points that Alex raise in the post.
Speaking about .NET integration:
Raven’s .NET client API trumps MongoDB .NET Driver, CosmosDB + Cosmonaut bundle and leaves smaller players like Cassandra (with DataStax C# Driver), CouchDB (with MyCouch) completely out of the competition.
When I wrote RavenDB 0.x, way before the 1.0 release, it took two months to build the core engine, and another three months to build the .NET client. Most of that time went on Linq integration, by the way. Yes, it literally took more time to build the client than the database core. We put a lot of effort into that. I was involved for years in the NHibernate project and I took a lot of lessons from there. I’m very happy that it shows.
Speaking about technical support:
RavenDB has a great technical support on Google Groups for no costs. All questions, regardless of the obtained license, get meaningful answers within 24 hours and quite often Oren Eini responds personally.
Contrary to the Google Groups, questions on StackOverflow are often neglected. It’s a mystery why Raven sticks to a such archaic style of tech support and hasn’t migrated to StackOverflow or GitHub.
I care quite deeply about the quality of our support, to the point where I’ll often field questions directly, as Alex notes. I have an article on Linked In that talks about my philosophy in that regard which may be of interest.
As to Alex’s point about Stack Overflow vs Google Groups, the key difference is the way we can discuss things. In Stack Overflow, the focus is on an answer, but that isn’t our usual workflow when providing support. Here is a question that would fit Stack Overflow very well, there is a well defined problem with all the details and we are able to provide an acceptable answer in a single exchange. That kind of interaction, on the other hand, is quite rare. It is a lot more common to have to have a lot more back and forth and we tend to try to give a complete solution, not just answer the initial question.
Another issue is that Stack Overflow isn’t moderated by us, which means that we would be subject to rules that we don’t necessarily want to adhere to. For example, we get asked similar questions all the time, which are marked as duplicated and closed on Stack Overflow, but we want to actually answer people.
GitHub issues are a good option for this kind of discussion, but they tend to cause people to raise issues, and one of the reasons that we have the google group is to create discussion. I guess it comes down to the different community that spring up from the communication medium.
Speaking about documentation:
RavenDB does have the official docs, which are easily navigable and searchable. Works well for beginners and provides good samples to start with, but there are gaps here and there, and it has far less coverage of the functionality than many popular free and open source projects.
…
Ultimately, as a developer, I want to google my question and it’s acceptable to have the answer on a non-official website. But if you’re having a deep dive with RavenDB, it’s unlikely to find it in the official docs, nor StackOverflow, nor GitHub.
Documentation has always been a chore for me. The problem is that I know what the software does, so it can be hard to even figure out what we need to explain. This year we have hired a couple more technical writers specifically to address the missing pieces in our documentation. I think we are doing quite well at this point.
What wasn’t available at the time of this post and is available now is the book. All of the details about RavenDB that you could care too and more are detailed there are are available. It is also available to Google, so in many cases your Google search will point you to the right section in the book that may answer your question.
I hope that these actions covered the gaps that Alex noted in our documentation. And there is also this blog, of course .
Speaking about issues that he had run into:
It’s reliable and does work well. Unless it doesn’t. And then a fix gets promptly released (a nightly build could be available within 24 hours after reporting the issue). And it works well again.
…
All these bugs (and others I found) have been promptly fixed.
… stability of the server and the database integrity are sacred. Even a slight risk of losing it can keep you awake at night. So no, it’s a biggy, unless the RavenDB team convinces me otherwise.
I didn’t include the list of issues that Alex pointed to on purpose. The actual issues don’t matter that much, because he is correct, from Alex’s perspective, RavenDB aught to Just Work, and anything else is our problem.
We spend a lot of time on ensuring a high quality for RavenDB. I had a two parts with Jeffery Palermo about just that, and you might be interested in this keynote that goes into some of the challenges that are involved in making RavenDB.
One of the issues that he raised was RavenDB crashing (or causing Windows to crash) because of a bug in the Windows Kernel that was deployed in a hotfix. The hotfix was quietly patched some time later by Microsoft, but in the meantime, RavenDB would crash. And a user would blame us, because we crashed.
Another issue (RavenDB upgrade failing) was an intentional choice by us in the upgrade, however. We had a bug that can cause data corruption in some cases, we fixed it, but we had to deal with potentially problematic state of existing databases. We chose to be conservative and ask the user to take an explicit action in this case, to prevent data loss. It isn’t ideal, I’m afraid, but I believe that we have done the best that we could after fixing the underlying issue. In doubt, we pretty much always have to fall on the prevent data loss vs. availability side.
Speaking about Linq & JS support:
let me give you a sense of how often you’ll see
LINQ
queries throwingNotSupportedException
in runtime.…
But in front of us a special case — a database written in the .NET! There is no need in converting a query to SQL or JavaScript.
I believe that I mentioned already that the initial Linq support for RavenDB literally took more time than building RavenDB itself, right? Linq is an awesome feature, for the consumer. For the provider, it is a mess. I’m going to quote Frans Bouma on this:
Something every developer of an ORM with a LINQ provider has found out: with a LINQ provider you're never done. There are always issues popping up due to e.g. unexpected expressions in the tree.
Now, as Alex points out. RavenDB is written in .NET, so we could technically use something like Serialize.Linq and support any arbitrary expression easily, right?
Not really, I’m afraid, and for quite a few reasons:
- Security – you are effectively allowing a user to send arbitrary code to be executed on the server. That is never going to end up well.
- Compatibility – we want to make sure that we are able to change our internals freely. If we are forced to accept (and then execute) code from the client, that freedom is limited.
- Performance – issuing a query in this manners means that we’ll have to evaluate the query on each document in the relevant collection. A full table scan. That is not a feature that RavenDB even has, and for a very good reason.
- Limited to .NET only – we currently have client for .NET, JVM, Go, Python, C++ and Node.JS. Having features just for one client is not something that we want, it really complicates our lives.
We think about queries using RQL, which are abstract in nature and don’t tie us down with regards to how we implement them. That means that we can use features such as automatic indexes, build fast queries, etc.
Speaking about RQL:
Alex points out some issues with RQL as well. The first issue relates to the difference between a field existing and having a null value. RavenDB make a distinction between these state. A field can have a null value or it can have a missing value. In a similar way to the behavior of NULL in SQL, which can often create similar confusion. The problem with RavenDB is that the schema itself isn’t required, so different documents can have different fields, so in our case, there is an additional level. A field can have a value, be null or not exist. And we reflect that in our queries. Unfortunately, while the behavior is well defined and documented, just like NULL behavior in SQL, it can be surprising to users.
Another issue that Alex brings up is that negation queries aren’t supported directly. This is because of the way we process queries and one of the ways we ensure that users are aware of the impact of the query. With negation query, we have to first match all documents the exclude all those that match the negation. For large number of documents, that can be expensive. Ideally, the user have a way to limit the scope of the matches that are being negated, which can really help performance.
Speaking about safe by default:
RavenDB is a lot less opinionated than it used to be. Alex rightfully points that out. As we got more users, we had to expand what you could do with RavenDB. It still pains me to see people do things that are going to be problematic in the end (extremely large page sizes are one good example), but our users demanded that. To quote Alex:
I’d preferred a slowed performance in production and a warning in the server logs rather than a runtime exception.
Our issue with this approach is that no one looks at the logs and that this usually come to a head at 2 AM, resulting in a support call from the ops team about a piece of software that broke. Because of this, we have removed many such features, while turning them to alerts, and the very few that remained (mostly just the number of requests per session) can be controlled globally by the admin directly from the Studio. This ensures that the ops team can do something if you hit the wall, and of course, you can also configure this from the client side globally easily enough.
As a craftsman, it pains me to remove those limits, but I have to admit that it significantly reduced the number of support calls that we had to deal with.
Conclusion:
Overall, I can say RavenDB is a very good NoSQL database for .NET developers, but the ”good” is coming with a bunch of caveats. I’m confident in my ability to develop any enterprise application with RavenDB applying the Domain-driven design (DDD) philosophy and practices.
I think that this is really the best one could hope for. I think that Alex’s review that is honest and to the point. Moreover, it is focused and detailed. That make very valuable. Because he got to his conclusions not out of brief tour of RavenDB but actually holding it up in the trenches.
Thanks, Alex.
More posts in "re" series:
- (19 Jun 2024) Building a Database Engine in C# & .NET
- (05 Mar 2024) Technology & Friends - Oren Eini on the Corax Search Engine
- (15 Jan 2024) S06E09 - From Code Generation to Revolutionary RavenDB
- (02 Jan 2024) .NET Rocks Data Sharding with Oren Eini
- (01 Jan 2024) .NET Core podcast on RavenDB, performance and .NET
- (28 Aug 2023) RavenDB and High Performance with Oren Eini
- (17 Feb 2023) RavenDB Usage Patterns
- (12 Dec 2022) Software architecture with Oren Eini
- (17 Nov 2022) RavenDB in a Distributed Cloud Environment
- (25 Jul 2022) Build your own database at Cloud Lunch & Learn
- (15 Jul 2022) Non relational data modeling & Database engine internals
- (11 Apr 2022) Clean Architecture with RavenDB
- (14 Mar 2022) Database Security in a Hostile World
- (02 Mar 2022) RavenDB–a really boring database
Comments
Thank you for the feedback on my feedback :) I can talk a lot about great things in Raven, but devs get used to them pretty quickly and the lack of frictions becomes unnoticeable. I'll briefly comment on the key points below and I'm happy to elaborate later if required.
The .NET integration I do believe in the 50/50 (or even 40/60) split between the amount of work on the Server and the .NET Client.
Technical support & docs Everyone appreciate your tech support, but I'm not buying the arguments in support of Google Groups. Yes, Stackoverflow is not a good fit, but GitHub issues is waay better Google Groups. The main reason - it's much easier to search in GitHub than in Google Groups. It's not even comparable. You would benefit eventually by having less tech support requests.
Bugs & issues In my post I provided just a small number of issues we had experienced. Some of them (like the negating in RQL) are perceived as minor obstacles, but others cause major frustration. Especially the ones related to the server stability. For example, just 2 days ago by running some tests on a dev instance of RavenDB in the Cloud, I managed to bring it down for 30+ minutes. Nothing was responding (timed out, HTTP code 502), even the Raven Studio. Such things are beyond Halloween spookiness. I did send a support request while it was down.
Linq & JS support About the better LINQ support... I would understand if it was economically not viable. But for most of the reasons I have counterarguments: Security - it is allowed JavaScript code, so running .NET code perhaps doesn't make it worse; Compatibility - yep, I asked for special treatment of the .NET clients, thinking they are the most of your customers; Performance - the query can be against the index, it's possible to run validations on expression trees.
RQL I agree that the discussed RQL issues can be considered as minor ones. Keeping in mind the underlining JSON structure is important for any NoSQL DB.
Speaking about safe by default Understand your sentiments. Perhaps, the problem could be addressed differently by adding API for reading warnings at the Session and/or DatabaseStore levels, along with admin notifications from the server (via email, etc.).
Also, google groups are ancient - I am halfway a neckbeard myself (mid thirties) and I cannot remember when I used any other google groups. Younger coworkers stare at it like a steam machine. They usually do not have accounts or anything else. They have github. And as you can see ( e.g. in the net core githubs where you are an active member), discussion happens there a lot.
Re: Google Groups vs Stack Overflow. Disclaimer - I work on RabbitMQ and help users via several support forums, including Google Groups (the rabbitmq-users list) and Stack Overflow. These are all my own opinions.
Stack Overflow is a quagmire of poorly-asked questions presented in a format that makes it almost impossible to have a useful discussion. It is impossible to help a user to get to the root of an issue on Stack Overflow. It only works if the issue or question at hand can immediately be answered without any further information.
Google Groups allows such discussions. I agree that the search function in GG is terrible. The RMQ core team does not use GitHub for discussions or support because we limit GH issues to actionable work only. This is an effort to keep GitHub representing the work we have done and would like to do.
We have discussed migrating from Google Groups to Discourse but that would add yet another system for us to manage. Hosted Discourse may mitigate that ... for a cost.
Add my +1 for getting off of Google Groups. It's so bad and archaic. I would say GitHub issues would be MUCH better. Or possibly Discourse.
Another vote to get away from Google Groups, may you guys should take a look into https://spectrum.chat for customer support, easy to navigate and fond answers.
Great -- and honest! -- post from Alex. Thanks for responding here, Oren.
I'll +1 on migrating away from Google Groups. StackOverflow isn't the right place for support, as other mention, it's not suited for discussion. I'd suggest GitHub Issues are a good fit. Much of the open source work happening in the .NET space uses GitHub Issues; I think it'd work well for Raven.
Alex,
The point about the lack of friction becoming unnoticable is probably the best compliment we could get. It is also one of the pain points in marketing RavenDB. It is quite hard to point out where there isn't pain, after you got used to it.
1) I'm not sure where you are going with this, can you explain in more detail?
2) Yes, we'll likely open up GitHub in the near future.
3) One of the problems we deal with is the there is quite a lot of things that can affect a database. In the case of a cloud machine, like yours, one off the most common reasons for such behaviors is exhausting the credits (CPU / storage, etc) that are allotted to the machine. RavenDB will alert on that, but if you didn't see it in time, it may be too late (in a cluster, we'll failover between machines when this happens). I could wish to not offer brustable systems, and they would make the systems much more predictable. But the problem is that at that point, we'll probably be priced much higher than what the market wants.
Tradeoffs everywhere.
4) Running JS code is running in a sendbox, limited to specific APIs that we provide and limited in the amount of damage that you can do. Running .NET code, on the other hand, has no such limitations. Note that RavenDB indexes using C# are limited to DB Admin, and we recommend not running with, specifically for that readon. If we allowed .NET code run as normal users, we would have no way to secure the system. Note that in general, it would take me about 1 minute from having access to a RavenDB server with DB Admin privileges to having full access to the server.
As for running the query against the index, you merely shifted things around. If I'm not executing the code directly, I would need to parse / understand it, which leads me right back to the complexities of Linq. You can run validations on expression trees, but that is complex, costly and futile. See: halting problem.
6) Adding API to read the warnings is something that we can do, sure. No one would use that. Case in point, are you aware that SQL has warnings. For example: https://mariadb.com/kb/en/library/show-warnings/ This is very rarely used, unfortunately. Especially after the first time that the query was run, I don't think that I have ever seen any code to read the warnings and do something with them in production code.
Luke , I agree with you on the different semantics of issues vs. discussions. They lead to very different style of interaction. This is something that we are going to spend some time thinking about.
Elías ,I think that this would just add another tool that the users have to sign up for.
Judah, Christian & Eric, Yes, we will likely enable GitHub issues as well.
Hi Oren,
Alex,The control process in this case will not be able to recover. It is the whole machine that run out of credits.Think about it like adding a driving instructor. The instructor can help if you are about to do something bad (run a red light), but it isn't going to do anything if you run out of gas. Note that we are already doing quite a bit internally to monitor and reduce resource utilization, but we need to balance availability here, we don't want to put the brakes too soon. You are using RavenDB to do something, being very fast in rejecting requests just lead to a broken system. It is also important to note that you are meant to be running RavenDB in cluster for high availability scenarios. In such a case, you should be able to failover to another node transparently to your system. Instead of having to build fault handling, that is already built into RavenDB. I'm very happy that you noted the increased stability of RavenDB. It has been a priority for a while and I'm gratified to see that it is being noted. 4) DotNetFiddle can run their scripts as a separate user, probably on a scratch machine. Trying to bring that level of separation to running queries would kill any performance we have. That said, we have continuously worked to make our queries better. Please note that I'm not rejecting your input, I agree that we need to do better, and we have worked to extend what you can do in the queries.See here for some of that work: https://github.com/ravendb/ravendb/blob/v4.2/src/Raven.Client/Util/JavascriptConversionExtensions.cs 6) Side by side failing and keeping the existing index alive is a feature. The whole idea is that we'll keep the index there until the new version is completed. And there are alerts and logs for index errors. They are pretty visible. See:https://ravendb.net/docs/article-page/4.1/csharp/studio/database/indexes/indexes-list-view#indexes-list-view---errors
An index that fails to process some records is unfortunately possible in a schemaless database. What happens if you have:
Rate = user.Dogs.Count/ user.Cats.Count
But for a catless user? We'll get an error, a predictable one, but something that probably shouldn't halt the entire index. There is another mechanism that measures the rate of errors, and if there are too many of them, we'll mark the whole index as errored and fail queries. But until that point, we'll alert / notify in the UI for that.Comment preview