Request for commentsRemoving graph queries from RavenDB
In version 4.2 we have added an experimental feature to RavenDB, Graph Queries. That was quite a bit of effort and we were really excited about it. The feature was marked as experimental and had been in the product in that state for the past 4 years or so.
Unfortunately, while quite impressive, it didn’t graduate from an experimental feature to a stable one. Mostly because there wasn’t enough usage of graph queries to warrant it. We have seen its usage in some cases, but it seems that our target audience isn’t interested in graph queries for RavenDB.
Given that there isn’t much use of graph queries, we are also aren’t spending much time there. We are looking at the 6.0 release (scheduled around July 2022) and we realize that this feature makes our life more complicated and that the support burden of keeping it outweigh its benefits.
For that reason, we have made the decision to remove the experimental Graph Queries from RavenDB in the 6.0 release. Before we actually pull the trigger on that, I wanted to get your feedback on the feature and its usage. In particular, if you are using it and if so, what are you using it for?
The most common scenarios for this feature are already covered via projection queries in RavenDB, which often can be easier to express for developers.
Regardless, the feature will remain in the 5.x branch and the 5.2 version LTS will support it until at least 2024.
More posts in "Request for comments" series:
- (10 Mar 2022) Removing graph queries from RavenDB
- (10 Oct 2008) Changing the way dynamic mocks behave in Rhino Mocks
Comments
Please don't do it.
It's a great selling point when choosing a database - even though you could do a lot of things without it, the official support for graph queries gives a peace of mind that once it is needed, it is there.
And competition has it to:
https://www.mongodb.com/databases/mongodb-graph-database
https://docs.microsoft.com/en-us/azure/cosmos-db/graph/graph-introduction
https://age.apache.org/
https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview
Graph storage and queues are super cool, but niche in their application/usage scenarios. I wouldn't mind if the feature is separated out to a new project, or is dropped.
Though I am still waiting for easier sharding support (like the Raven 3.x style) to return...
Great decision, you don't need more features. keep focus and do what you're best of - give me the fastest database. period. replace Lucene is big enough challenge, you don't need more weights.
this reminds me IE6 support, yes, there are some users with it, but for the 99% of the users it's not relevant.
Our product not yet utilize that feature, so impact for us is none. Whether we will choose other database if RavenDB remove it or not. The answer is no. It is not the core value we initially choose RavenDB.
Also, I would rather prefer specialized database rather than Jack of all trades, master of none.
I remember original implementation of Twitter was to using multiple database for different purpose. Same goes with RavenDB. As long as ETL can be done easily, accurately on the database, then data can be export and reshaped into a graph specific database. Which is more specialized, more performant. Even Microsoft's Data Lake + PowerBI can be an lower performance alternative to the job. To generate user report.
Same can be said on full text search, of course that as baseline make RavenDB shine and we do depends on it, but there is other option, such as ETL to other full text search services. It does raise the cost, but some of those full text search service makes easy for ordinary developer to use. Also able to support different language easily. Language that not depends on space to separate word. Such as Asian language.
I'll admit we fall into the class of users that have intentions to use graph capabilities for one of our use cases, but haven't gotten around to it yet. That said, even then I don't believe we need the full capabilities that graph queries cover. So yeah, cull it.
As a token replacement though, a nice addition may be a more intrinsic way of creating labelled relationships between documents. No, I'm not talking about a relational DB. Imagine an ecommerce DB with a list of a few million products. Some of these products can be linked together in some novel ways:
So far, so good. All doable within standard modelling and queries. But what about going far beyond that and being able to create an arbitrary number of different types of links between different products and then navigating those links. For example:
These can be modeled and queried using a collection of documents per relationship type and product (e.g. for the first case, a document per product in a PWBAB collection that contains a list of other products). It could also be modeled in a graph db with labelled edges between nodes. But it can get a bit clunky in a pure document db maintaining all those relationship documents because there's no referential integrity between the docs - deleting a product and cleaning up the relationships it is involved in falls onto application logic. So we CAN do relationships, we just need to manage them in the app, not the DB.
Maybe that's ok. Maybe adding some kind of first-class-citizen reference between documents is a step too close to being a relational DB. Maybe people who come from a relational background would end up abusing it. You're probably absolutely right in https://ayende.com/blog/4584/ravendb-includes ("disallow associations between documents").
But also, maybe some enhancements to includes could replace graph queries with something more lightweight. Maybe the referential cleanup could be BASE instead of ACID. Maybe there could be a more intrinsic way of loading 2-3 levels deep of a relationship or querying what relationships a product has. Maybe it's yet another differentiator to other document DB's without as big a footprint as graph queries.
Just thinking out loud to how we'll eventually implement our product db without graph queries. It'll be easy, but maybe it could be easier in some simple ways.
Jon's idea of extraction to a separate project would be a nice compromise indeed - the weight of keeping it up-to-date could be moved onto the community. Keeping it (at least as a separate project) would also allow to retain the current version of https://ravendb.net/why-ravendb/multi-model
Jason's idea to use ETL to send data to a "real" graph database is somewhat calming. Obviously using additional database increases the cost of development and administration but maybe it is inevitable trade-off.
Milosz,
Features have cost, and they have to pay for themselves. Note that we have the same set of features with the graph API or not. We can do the same sort of lookups, it is an issue of what runs the query and what we are guiding for.
And as a personal note, I would rather RavenDB be an awesome database for its core competencies rather than provide an 80% solution.
Jon,
Part of the reason for dropping graph queries is actually that the 6.0 edition has built in sharding support. Providing sharded graph queries is a huge cost, and we didn't see the pickup on the feature to justify it.
Jason,
I certainly agree with you about master of none. In addition, I just wanted to point out that we now also have ETL to Elastic Search, if you want to go that route. And we do support full text search on non latin languages, including Asian ones.
Trev,
Those are actually possible right now.
Take a look at this post, which seems to be almost exactly what you are looking for:
https://ravendb.net/articles/product-recommendations-in-ravendb
Note that this is actually a map/reduce, so you aren't querying the raw data (which is good, since you get faster responses).
Note that for your needs, those aren't actually associations between documents. Those are emreging relationships, not between two documents, but between classes of those. That is why an index approach works better in that regard.
I am aware of ETL to elastic search, we could also export for Azure search. Currently our product is focus on English only, so it shouldn't be any issue for us.
When I say full text search on Asian language, I'm more towards specialized tokenizer. Like Microsoft's tokenizer for different languages. e.g. Azure Search Tokenizer.
I know you can do NGramAnalyzer, but that's not optimal for Asian or latin languages, it trades space for capability.
I remember I asked you about language specific tokenizer, which you said I can create my own tokenizer and it is not currently RavenDB's goal. Which is understandable. Each language specific tokenizer requires language experts. Unless you integrate with other service that already has it.
Jason,
You can do that with RavenDB by using analyzers, for example, this one: https://lucenenet.apache.org/docs/3.0.3/d2/dab/_chinese_analyzer_8cs_source.html
You can add those analyzers to RavenDB and utilize them. The issue is that we aren't providing them OOTB, but they exists.
That's good to know. Thanks for info.
I had a use-case recently that I would have liked to use ravendb, but ended up going with Neo4j since it had a lot of built-in graph processing function s such as page-rank. I'm considering replicating data between ravendb and neo4j as I continue the project (raven for main data, map/reduce and neo4j for fancy calculations). A built-in replication/processing pipeline would be neat. This is my fancy part of my current usage:
https://github.com/ops-ai/PageRank-Crawler/blob/develop/PageRank-Crawler/Program.cs#L82
Comment preview