Feature Design: ETL for Queues in RavenDB

time to read 4 min | 667 words

RavenDB is rarely deployed in isolation, it is typically used in existing systems and is integrated into the overall system. One of the key ways by which this is promoted is the built-in ETL support that we have. RavenDB currently has ETL for Postgres, SQL Server, Oracle, MySQL, Elastic,  OLAP / Date Lake, and other RavenDB instances.

We are looking into adding RavenDB ETL support to queues (RabbitMQ, Kafka, SQS, AQS, etc). That support is the topic of this blog post. I wanted to summarize my thinking about the topic and along the way gather some insight from you about what kind of shape this feature should have.

imageWhen talking about ETL to Queues, we have to deal with two distinct scenarios: receiving and sending. For the other ETL targets in RavenDB, we just send data, but for queues, given that there is a well defined interface for pulling the results, it makes sense to support receiving as well. Let’s consider what it means to be able to receive messages from a queue into RavenDB…

It means that RavenDB will listen to a queue and apply a script to it. That script will be able to insert or modify documents as a result of the message contents. For example, let’s assume that we have the queue defined as in the image on the right. We can write the following script to process messages from the queue.

The script above handles two message types. A recording of a new order or adding a line item to an existing order. It will be invoked by RavenDB whenever it receives a message from the queue. In this way, you can have RavenDB build your domain model directly from the message traffic. Of course, this is a pretty simplistic scenario, there are a lot of more interesting scenarios to explore here.

The second part is when RavenDB will be the one sending messages to the queues. Those messages, naturally, would be generated from the documents in the database. How would that work? We can write a script that would be applied to documents as they change which will output the messages to write to the queue. That is how ETL in general works in RavenDB. For queues, however, the situation is a bit more complex.

When we use ETL to sync data from RavenDB to a relational database, any update of the document will also update the data in the relational database. When we send the data to a queue, what would happen then? Well, we can’t update a message in the queue, that doesn’t make any sort of sense. So we need to consider what is the scenario we have here. One option would be to just send the message each time, every update of a document will generate a new message. Or the author of the ETL script may decide to only send it once, of course.

The scenario that I think is far more likely is to use RavenDB and ETL to Queue as part of a larger scheme. Consider the scenario where you want to use the outbox pattern. In other words, you have a transaction that needs to do a bunch of things, including sending messages on a queue. Instead of trying to create a distributed transaction or carefully coordinate things, you will use this feature. Your transaction will save a Message document alongside any other changes. That relies on RavenDB’s ACID nature to ensure that this happens in an atomic manner.

Then you will be able to utilize the ETL to Queues option to actually send that over to the actual queue, in a reliable manner.

Those two scenarios (send & receive) are the two most likely scenarios for this feature, but the point of this post is to get more feedback from you. What kind of use cases do you think that this will enable? What would you like to be able to do?