We designed RavenDB to be a server side database, to be used to run large scale business applications. Surprisingly for us, there is a large group of users that have taken RavenDB and actually run it as part of their deployed systems. In other words, instead of having a single large RavenDB cluster they will typically deploy many (hundreds in the small cases, tens of thousands to millions in the large cases) of RavenDB instances across a wide variety of locations.
Part of that is the fact that RavenDB can be embedded inside an application quite easily. That means that we don’t need complex setup or administration. You can just use RavenDB from your application and everything Will Just Work. Another factor is the fact that you can run RavenDB on very low end machines, including 32 bits machines, ARM SoC, etc.
One use case was a point of sales system that had to spec out their hardware a decade in advanced and had to deal with existing installations that were still running hardware from 10 years ago (with little desire to upgrade). Another use case was deploying RavenDB as part of an industrial robot package, with RavenDB installed on a 32 bits ARM system on chip that control the robot.
That kind of deployment pattern lead to interesting requests. For example, several of our customers need ad hoc replication in a location. So all the nodes in a particular physical location will join together to a full mesh of replicated nodes. This gives us high availability in a particular location with any node in the network being able to service any request across the entire location. Boot up a new machine, wait a bit for the rest of the network to update it and you are good to go. This also helps when you consider your machines to be unreliable (because they are old, beaten down and generally minimally maintained).
Another scenario with the need for dynamic topologies is the deployment of RavenDB as set of independent nodes that need to report to some sort of head quarters. This is easy to do by defining external replication or ETL on the node and have it send all the relevant data to a central location for processing. This way, you get a cheap “always available” local node but can still have a global view of your data. I posted about something similar in the past, if you care for the details.
We are now looking for additional features to serve this kind of deployment. In particular, we are interested in making it easy to share data and generate analytics across widely distributed and separated set of instances. One of things that we are currently considering is some form of integration with the cloud. For example, consider Amazon Athena, which allow you to run analytics queries on files residing in S3. We can define ETL processes that would upload the data from RavenDB as it is changed on each individual node. This way, you have each node pushing data to the cloud and a central location that can run live analytics on the data.
What are your thoughts on this? And what other features do you think will serve this kind of scenario?