Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 7,105 | Comments: 49,934

filter by tags archive
time to read 4 min | 636 words

Yesterday I posted about Parler banning and the likely impact of that, both legally and in terms of the technical details. My expectations is that new actors will step in to fill the existing demand created by the current social network account suspensions. I had spent some time thinking about the likely effects of this, and I think that it will lead to some interesting results.

A new social network will very likely rise as a result of those actions. That network would have to be resilient for de-platforming issues. That means that it cannot assume that it can run on any of the cloud services, at least not as normally understood by today’s standards. That means that we are likely to see one of two options:

  • Fully distributed systems – independent nodes collaborating with one another to create a network. Each node may be host and operated independently. Similar to how torrents work and other fully distributed P2P systems.
  • Distributed infrastructure – a set of servers that are running on behalf of a single entity, but are spread over multiple vendors and locations. The idea is that the shutdown of a single or multiple vendors will have little impact, because of distribution of effort.

The first option is probably something like Mastodon, but I would really like to see a return to blogs & RSS as the preferred social network. That has the advantage of a true distributed model without a single controlling actor. It is also much lower cost in terms of technology and complexity. Discovery of new blogs can be handled via recommendations, search, etc.

The reason I prefer this option is that I like to blog Smile. More seriously, owning your own content and distribution platform has just become quite important. A blog is about as simple a piece of software as you can imagine. Consuming blogs is an act that require no publication of personal information, no single actor that can observe everything you do, etc.

I don’t know if this will be the direction, although it is my favorite one. It is possible that we’ll end up with Mastodon empire, with many actors creating networks of servers which may or may not be interconnected. I can see a future where you’ll have a network of dog owners vs. cat owners, but the two aren’t federated and there are isolated discussions between them.

Given that you could create links from one to the other, I don’t think we have to deal with total echo chambers. Consider a post in the cats social network: The dog owners are talking about the chore of having to go for walks at “dogs://social.media/walks-are-great”, that is so high maintenance, the silly buggers. 

That would create separate communities, with their own rules and moderation. Consider this something like subreddits, but without the single organization that can enforce global rules.

The other alternative is that a social network would rise with a truly distributed backend that is resilient to de-platforming issues. From an outside perspective, this will present as something to the existing social networks. That has the advantage of requiring the least from users, but it is a non trivial technical challenge.

I prefer the first option, but I believe it is more likely we’ll end up with the second. The reason for that is monetization strategies. If you have a many different actors cooperating to create a network, there is a question on how you pay for that. The typical revenue model for social network is advertising. That doesn’t work so well where there isn’t a single actor that can sell ads (and track users).

That said, it would be much faster and easier to get started with the first option and it may be that we’ll end up there with the force of inertia.

time to read 6 min | 1121 words

RavenDB is a distributed database. You can run it on a single node, in a cluster on a single data center on as a geo distributed cluster. Separately, you can also run RavenDB in a multi master configuration. In this case, you don’t have a single cluster spanning the globe, but multiple cooperating clusters working together. The question is, when should I use a geo distributed cluster and when should I setup a multi master configuration with multiple coordinating clusters?

Here is an example of a global cluster:

image

As you can see, we have nodes all over the world, all joined into a single cluster. In this mode, the nodes will select a leader (denoted by the crown) which will manage all the behavior of the cluster. To ensure that we can properly detect failures, we setup a timeout interval that is appropriate for the distances involved.  Note that even in this mode, most of the actual writes to RavenDB will be done on a pure node-local basis and gossiped between the nodes. You can select the appropriate level of write assurance that you want (confirm after it was written to two additional locations, for example).

Such a setup suffer from one issue, coordinating across that distance (and latencies) means that we are need to account for the inherent delay in the decision loop. For most operations, this doesn’t actually impact your system, since most of the work is done in the background and not user visible. It does mean that RavenDB is going to take longer to detect and recover from failures. In the case of the setup in the image, we are talking about the difference between less than 300ms (the default when running in a single data center) to detecting failure in around 5 seconds or so.

Because RavenDB favors availability, it usually doesn’t matter. But there are cases where it does. Any time that you have to wait for a cluster operation, you’ll feel the additional latency. This applies not just to failure detection but when everything is running smoothly. A cluster operation in the above setup will require confirmation from two additional nodes aside from the leader. Ping time in this case would be 200 – 300ms between the nodes, in most cases. That means that at best, any such operation would be completed in 750ms or so.

What operations would this apply to? Creation of new databases and indexes is done as a cluster operation, but they are rarely latency sensitive. The primary issue for this sort of setup is if you are using:

  • Cluster wide transactions
  • Compare exchange operations

In those cases, you have to account for higher latency as part of your overall deployment. Such operations are inherently more expensive. If you are running in a single data center, with ping times that are usually < 1 ms, that is not very noticeable. When you are running in a geo distributed environment, that matters a lot more.

One consideration that we haven’t yet taken into account is what happens during failure. Let’s assume that I have a web application deployed in Brazil, which is aimed at the local RavenDB instance. If the Brazilian’s RavenDB instance decide to visit the carnival and not respond, what is going to happen? On the one hand, the other nodes in the cluster will simply pick up the slack. But for the web application in Brazil, that means that instead of using the local instance, we need to go wide to reach the alternative nodes. GitHub had a similar issue, but between east and west coast and the additional latency inherent in such a setup took them down.

To be honest, beyond the additional latency that you have with cluster wide operations in such a setup, I think that this is the biggest disadvantage of such a system. You can avoid that by running multiple nodes in each location, all joined into a big cluster, of course. Then you can set things up that each client will use the nearest nodes. That gives you local failover, but you still need to consider how to handle total outage in one location.

There is another alternative, in which case you have separate clusters in each location (may be a single instance, but I’m showing a cluster there because you’ll want local high availability). Instead of having a single cluster, we set things up so there are multiple such clusters. Then we use RavenDB’s multi-master capabilities to tie them all together.

image

In this setup, the different clusters will gossip between themselves about the data, but that is the only thing that is truly shared. Each cluster will manage its own nodes and failover and work is done only on the local cluster level, not globally.

Other things, indexes, ETL, subscriptions etc are all defined at the cluster level, and you’ll need to consider whatever you’ll have them in each cluster (likely for indexes, for example), or only on a single location. Something like ETL would probably have a designated location that would be pushing the data to its destination, rather than duplicated in each local cluster.

The most interesting question, however, is how do you handle cluster wide transactions or compare exchange operation in such an environment?

A cluster wide transaction is… well, cluster wide. That means that if you have multiple clusters, your consistency scope is only within a single one. That is probably the major limit for breaking apart the cluster in such a system.

There are ways to handle that, of course. You can split your compare exchanges so they will have a particular cluster that will own them, in this manner, you can direct certain operations to a particular cluster regardless of where the operation originated. In many environments, this is already something that will naturally happen. For example, if you are running in such an environment to deal with local clients, it is natural to hold their compare exchange values for the cluster they are using, even if the data is globally replicated.

Another factor to consider is that RavenDB replicates documents  and their data, but compare exchange values aren’t considered in this case. They are global to the cluster, and as such aren’t sent via replication.

I’m afraid that I don’t have one answer to the question how to geo distribute your RavenDB based system. You need to account for several factors about your application, your needs and the system as a whole. But I hope that you’ll now have the appropriate background to make an informed decision.

time to read 2 min | 297 words

imageI mentioned in my previous post that I managed to lock myself out of the car by inputting the wrong pin code. I had to wait until the system reset before I could enter the right pin code. I got a comment to the post saying that it would be better to use a thumbprint scanner for the task, to avoid this issue.

I couldn’t disagree more.

Let’s leave aside the issue of biometrics, their security and the issue of using that for identity. I don’t want to talk about that subject. I’ll assume that biometrics cannot fail and can 100% identify a person with no mistakes and no false positives and negatives.

What is the problem with a thumbprint vs. a pin code as the locking mechanism on a car?

Well, what about when I need someone else to drive my car? The simplest example may be valet parking, leaving the car at the shop or just loaning it to someone.  I can give them the pin code over the phone, I’m hardly going to mail someone my thumb because. There are many scenarios where I actually want to grant someone the ability to drive my car, and making it harder to do so it a bad idea.

There is also the issue of what happens if my thumb is inoperable? It might be raining and my hands are wet, or I changed a tire and will need half an hour at the sink to get cleaned up again.

You can think up solutions to those issues, sure, but they are cases where the advanced solution makes anything out of the ordinary a whole lot more complex. You don’t want to go there.

time to read 3 min | 582 words

Complex systems are considered problematic, but I’m using this term explicitly here. For reference, see this wonderful treatise on the subject. Another way to phrase this is how to create robust systems with multiple layers of defense against failure.

When something is important, you should prepare in advance for it. And you should prepare for failure. One of the typical (and effective) methods for handle failures is the good old retry. I managed to lock myself out of my car recently, had to wait 15 minutes before it was okay for me to start it up. But a retry isn’t going to help you if the car run out of fuel, or there is a flat tire. In some cases, a retry is just going to give you the same failing response.

Robust systems do not have a single way to handle a process, they have multiple, often overlapping, manners to do their task. There is absolutely duplication between these methods, which tend to raise the hackles of many developers. Code duplication is bad, right? Not when it serve a (very important) purpose.

Let’s take a simple example, order processing. Consider the following example, an order made on a website, which needs to be processed in a backend system:

image

The way it would work, the application would send the order to a payment provider (such as PayPal) which would process the actual order and then submit the order information via web hook to the backend system for processing.

In most such systems, if the payment provider is unable to contact the backend system, there will be some form of retry. Usually with a exponential back-off strategy. That is sufficient to handle > 95% of the cases without issue. Things get more interesting when we have a scenario where this breaks. In the above case, assume that there is a network issue that prevent the payment provider from accessing the backend system. For example, a misconfigured DNS entry means that external access to the backend system is broken. Retrying in this case won’t help.

For scenarios like this, we have another component in the system that handles order processing. Every few minutes, the backend system queries the payment provider and check for recent orders. It then processes the order as usual. Note that this means that you have to handle scenarios such as an order notification from the backend pull process concurrently with the web hook execution. But you need to handle that anyway (retrying a slow web hook can cause the same situation).

There is additional complexity and duplication in the system in this situation, because we don’t have a single way to do something.

On the other hand, this system is also robust on the other direction. Let’s assume that the backend credentials for the payment provider has expired. We aren’t stopped from processing orders, we still have the web hook to cover for us.  In fact, both pieces of the system are individually redundant. In practice, the web hook is used to speed up common order processing time, with the backup pulling recent orders as backup and recovery mechanism.

In other words, it isn’t code duplication, it is building redundancy into the system.

Again, I would strongly recommend reading: Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety.

time to read 5 min | 970 words

A customer reported that on their system, they suffered from frequent cluster elections in some cases. That is usually an indication that the system resources are hit in some manner. From experience, that usually means that the I/O on the machine is capped (very common in the cloud) or that there is some network issue.

The customer was able to rule these issues out. The latency to the storage was typically withing less than a millisecond and the maximum latency never exceed 5 ms. The network monitoring showed that everything was fine, as well. The CPU was hovering around the 7% CPU and there was no reason for the issue.

Looking at the logs, we saw very suspicious gaps in the servers activity, but with no good reason for them. Furthermore, the customer was able to narrow the issue down to a single scenario. Resetting the indexes would cause the cluster state to become unstable. And it did so with very high frequency.

“That is flat out impossible”, I said. And I meant it. Indexing workload is something that we have a lot of experience managing and in RavenDB 4.0 we have made some major changes to our indexing structure to handle this scenario. In particular, this meant that indexing:

  • Will run in dedicated threads.
  • Are scoped to run outside certain cores, to leave the OS capacity to run other tasks.
  • Self monitor and know when to wind down to avoid impacting system performance.
  • Indexing threads are run with lower priority.
  • The cluster state, on the other hand, is run with high priority.

The entire thing didn’t make sense. However… the customer did a great job in setting up an environment where they could show us: Click on the reset button, and the cluster become unstable.  So it is impossible, but it happens.

We explored a lot of stuff around this issue. The machine is big and running multiple NUMA node, maybe it was some interaction with that? It was highly unlikely, and eventually didn’t pan out, but that is one example of the things that we tried.

We setup a similar cluster on our end and gave it 10x more load than what the customer did, on a set of machines that had a fraction of the customer’s power. The cluster and the system state remain absolutely stable.

I officially declared that we were in a state of perplexation.

When we run the customer’s own scenario on our system, we saw something, but nothing like what we saw on their end. One of the things that we usually do when we investigate resource constraint issues is to give the machines under test a lot less capability. Less memory and slower disks, for example, means that it is much easier to surface many problems. But the worse we made the situation for the test cluster, the better the results became.

We changed things up. We gave the cluster machines with 128 GB of RAM and fast disks and tried it again. The situation immediately reproduced.

Cue facepalm sound here.

Why would giving more resources to the system cause instability in the cluster? Note that the other metrics also suffered, which made absolutely no sense.

We started digging deeper and we found the following index:

It is about as simple an index as you can imagine it would be and should cause absolutely no issue for RavenDB. So what was going on? We then looked at the documents…

image

I would expect the State field to be a simple enum property. But it is an array that describe the state changes in the system. This array also holds complex objects. The size of the array is on average about 450 items and I saw it hit a max of 13,000 items.

That help clarify things. During index, we have to process the State property, and because it is an array, we index each of the elements inside it. That means that for each document, we’ll index between 400 – 13,000 items for the State field. What is more, we have a complex object to index. RavenDB will index that as a JSON string, so effectively the indexing would generate a lot of strings. These strings are going to be held in memory until the end of the indexing batch. So far, so good, but the problem in this case was that there were enough resources to have a big batch of documents.

That means that we would generate over 64 million string objects in one of those batches.

Enter GC, stage left.

The GC will be invoked based on how many allocations you have (in this case, a lot) and how many live objects you have. In this case, also a lot, until the index batch is completed. However, because we run GC multiple times during the indexing batch, we had promoted significant numbers of objects to the next generation, and Gen1 or Gen2 collections are far more expensive.

Once we knew what the problem was, it was easy to find a fix. Don’t index the State field. Given that the values that were indexed were JSON strings, it is unlikely that the customer actually queried on them (later confirmed by talking to the customer).

On the RavenDB side, we added monitoring for the allocation frequency and will close the indexing batch early to prevent handing the GC too much work all at once.

The reason we failed to reproduce that on lower end machine was simple, RavenDB already used enough memory so we closed the batch early, before we could gather enough objects to cause the GC to really work hard. When running on a big machine, it had time to get the ball rolling and hand the whole big mess to the GC for cleanup.

time to read 1 min | 65 words

I had the pleasure of talking with Vaishali Lambe in SoLeadSaturday this week.

We talked about various aspects of being in - Building a business around open source software, Working in a distributed teams, Growing a company from 1 employee to 30+, Non technical details that are important to understand to run a company. Also, discussed an importance of blogging.

You can check it here.

time to read 3 min | 541 words

One of the distinguishing feature of RavenDB is its ability to process large aggregations very quickly. You can ask questions on very large data sets and get the results in milliseconds. This is interesting, because RavenDB isn’t an OLAP database and the kind of questions that we ask can be quite complex.

For example, we have the Products/Recommendations index, which allow us to ask:

For any particular product, find me how many times it was sold, what other products were sold with it and in what frequency.

The index to manage this is here:

The way it works, we map the orders and have a projection for each product, and then we add the other products that were sold with the current one. In the reduce, we group by the product and aggregate the related products together.

But I’m not here to talk about the recommendation engine. I wanted to explain how RavenDB process such indexes. All the information that I’m talking about can be seen in the Map/Reduce visualizer in the RavenDB Studio.

Here is a single entry for this index. You can see that products/11-A was sold 544 times and 108 times with products/69-A.

image

Because of the way RavenDB process Map/Reduce indexes, when we query, we run over the already precomputed results and there is very little computation cost at querying time.

Let see how RavenDB builds the index. Here is a single order, where three products were sold. You can see that each of them as a very interesting tree structure.

image

Here is how it looks like when we zoom into a particular product. You can see how RavenDB aggregate the data. First in the bottom most page on the right (#596). We aggregate that with the other 367 pages and get intermediate results at page #1410. We then aggregate that again with the intermediate results in page #105127 to get the final tall. In this case, you can see that products/11-A was sold 217,638 times and mostly with products/16-A (30,603 times) and products/72-A (20,603 times).

image

When we have a new order, all we need to do is update a bottom most page and then recurse upward in the three. In the case we have here, there is a pretty big reduce value and we are dealing with tens of millions of orders. We have three levels to the tree, which means that we’ll need to do three update operations to account for new or updated data. That is cheap, because it means that we have to do very little work to maintain the index.

At query time, of course, we don’t really have to do much, all the hard work was done.

I like this example, because it shows case a non trivial example and how RavenDB handles this with ease. These kind of non trivial work is something that tend to be very hard to get working properly and with RavenDB this is part of my default: “let’s do this on the fly demo”.

time to read 3 min | 452 words

It should come as no surprise that our entire internal infrastructure is running on RavenDB. I wholly believe in the concept of dog fooding and it has serve us very well over the years.

I was speaking to a colleague just now and it occurred to me that it is surprising that we do certain things wrong, intentionally. It is fair to say that we know what the best practices for using RavenDB are, the things that you can do to get the most out of it.

In some of our internal systems, we are doing things in exactly the wrong way. We are doing things that are inefficient in RavenDB. We take the expedient route to implement things.  A good example of that is that we have a set of documents that can grow to be multiple MB in size. They are also some of the most common changed documents in the system. Properly design would call to break them apart to make things easier for RavenDB.

We intentionally modeled things this way. Well, I gave the modeling task to an intern with no knowledge of RavenDB and then I made things worse for RavenDB in a few cases where he didn’t get it out of shape enough for my needs.

Huh?! I can hear you thinking. Why on earth would we do something like that?

We do this because if serves as an excellent proving ground for misuse of RavenDB. It show us how the system behave under non ideal situations. Not just when the user is able to match everything to the way RavenDB would like things to be, but how they are likely to build their system. Unaware of what is going on behind the scenes and what the optimal solution would be. We want RavenDB to be able to handle that scenario well.

An example that pops to mind was having all the uploads on the system be attachments on a single document. That surfaced that we had a O(N^2) algorithm very deep in the bowels of RavenDB for placing a new attachment. It would be completely invisible under normal case, because it was fast enough under any normal or abnormal situation that we could think of. But when we started getting high latency from uploads, we realized that adding the 100,002th attachment to a document required us to scan through the whole list… it was obvious that we needed a fix. (And please, don’t put hundreds of thousands of attachments on a document, it will work (and it is fast now), but it isn’t nice).

Doing the wrong thing on purpose means that we can be sure that when users are doing the wrong thing accidently, they get good behavior.

time to read 3 min | 577 words

email-me-clipart | free clip art from: www.fg-a.com/email1.s… | FlickrWe got a feature request that we don’t intend to implement, but I thought the reasoning is interesting enough for a blog post. The feature request:

If there is a critical error or major issue with the current state of the database, for instance when the data is not replicated from Node C to Node A due to some errors in the database or network it should send out mail to the administrator to investigate on  the issue. Another example is, if the database not active due to some errors then it should send out mail as well.

On its face, the request is very reasonable. If there is an error, we want to let the administrator know about it, not hide it in some log file. Indeed, RavenDB has the concept of alerts just for that reason, to surface any issues directly to the admin ahead of time. We also have a mechanism in place to allow for alerts for the admin without checking in with the RavenDB Studio manually: SNMP. The Simple Network Monitoring Protocol is designed specifically to enable this kind of monitoring and RavenDB expose a lot of state via that which you can act upon in your monitoring system.

Inside your monitoring system, you can define rules that will alert you. Send an SMS if the disk space is low, or email on an alert from RavenDB, etc. The idea of actively alerting the administrator is something that you absolutely want to have.

Having RavenDB send those emails, not so much. RavenDB expose monitoring endpoint and alerts, it doesn’t act or report on them. That is the role of your actual monitoring system. You can setup Zabbix or talk to your Ops team which likely already have one installed.

Let’s talk about the reason that RavenDB isn’t a monitoring system.

Sending email is actually really hard. What sort of email provider do you use? What options are required to set it up a connection? Do you need X509 certificate or user/pass combo? What happens if we can’t send the email? That is leaving aside the fact that actually getting the email delivered is hard enough. Spam, SPF, DKIM and DMARC is where things start. In short, that is a lot of complications that we’ll have to deal with.

For that matter, what about SMS integration? Surely that would also help. But no one uses SMS today, we want WhatsApp integration, and Telegram, and … You go the point.

Then there are social issues. How will we decide if we need to send an email or not? There should be some policy, and ways to configure that. If we won’t have that, we’ll end up sending either too many emails (which will get flagged / ignored) or too few (why aren’t you telling me about XYZ issue?).

A monitoring system is built to handle those sort of issues, it is able to aggregate reports and give you a single email with the current status, open issues for you to fix and do a whole lot more that is simply outside the purview or RavenDB. There is also the most critical alert of all, if RavenDB is down, it will not be able report that it is down because it is down.

The proper way to handle this is to setup integration with a monitoring system, so we’ll not be implementing this feature request.

time to read 6 min | 1084 words

In my previous post, I wrote about the case of a medical provider that has a cloud database to store its data, as well as a whole bunch of doctors making house calls. There is the need to have the doctors have (some) information on their machine as well as push updates they make locally back to the cloud.

image

However, given that their machines are in the field, and that we may encounter a malicious doctor, we aren’t going to fully trust these systems. We still want the system to function, though. The question is how will we do it?

Let’s try to state the problem in more technical terms:

  • The doctor need to pull data from the cloud (list of patients to visits, patient records, pharmacies and drugs available, etc).
  • The doctor nee to be able to create patient records (exam made, checkup results, prescriptions, recommendations, etc).
  • The doctor’s records needs to be pushed to the cloud.
  • The doctor should not be able to see any record that is not explicitly made available to them.
  • The same applies for documents, attachments, time series, counters, revisions, etc.

Enforcing Distributed Data Integrity

The requirements are quite clear, but they do bring up a bit of a bother. How are we going to enforce it?

One way to do that would be to add some metadata rule for the document, deciding if a doctor should or should not see that document. Something like that:

image

In this model, a doctor will have be able to get this document if they have any of the tags associated with the document.

This can work, but that has a bunch of non trivial problems and a huge problem that may not be obvious. Let’s start from the non trivial issues:

  • How do you handle non document data? Based on the owner document, probably. But that means that we have to have a parent document. That isn’t always the case.
  • It isn’t always the case if the document was deleted, or is in a conflicted state.
  • What do you do with revisions, if the access tags has changed? What do you follow?

There are other issues, but as you can imagine, they are all around managing the fact that this model allows you to change the tags for the document and expect to handle this properly.

The huge problem, however, is what should happen when a tag is removed? Let’s assume that we have the following sequence of events:

  • patients/oren is created, with access tag of “doctors/abc”
  • That access tag is then removed
  • Doctor ABC’s machine is then connected to the cloud and setup replication.
  • We need to remove patients/oren from the machine, so we send a tombstone.

So far, so good. However, what about Doctor' XYZ’s machine? At this time, we don’t know what the old tags were, and that machine may or may not have that document. It shouldn’t have it now, so we’ll send a tombstone there? That leads to information leak by revealing document ids that we aren’t authorized for.

We need a better option.

Using the Document ID as the Basis for Data Replication

We can define that once created, the access tags are immutable, and that would help considerably.  But that is still fairly complex to manage and opens up issues regarding conflicts, deletion and re-creation of a document, etc.

Instead, we are going to use the document’s id as the source for the decision to replicate the document or not. In other words, when we register the doctor’s machine, we set it up so it will allow:

Incoming paths Outgoing paths
  • doctors/abc/visits/*
  • tasks/doctors/abc/*
  • patients/clinics/33-conventry-rd/*
  • pharmacies/*
  • tasks/doctors/abc/*
  • doctors/abc
  • laboratories/*

In this case, incoming and outgoing are defined from the point of view of the cloud cluster. So this setup allows the doctor’s machine to push updates to any document with an id that starts with “doctors/abc/visits/” or “tasks/doctors/abc/*”. And the cloud will send all pharmacies and laboratories data. The cloud will also send all the patients for the doctor’s clinic as well as the tasks for this doctor, finally, we have the doctor’s record itself. Everything else will be filtered.

This Model is Simple

This model is simple, it provides a list of outgoing and incoming paths for the data that will be replicated. It is also quite surprisingly powerful. Consider the implications of the configuration above.

The doctor’s machine will have a list of laboratories and pharmacies (public information) locally. It will have the doctor’s own document as well as records of the patients in the clinic. The doctor is able to create and push patient visit’s records. Most interestingly, the tasks for the doctor are defined to allow both push and pull. The doctor will receive updates from the office about new tasks (home visits) to make and can mark them complete and have it show up in the cloud.

The doctor’s machine (and the doctor as well) is not trusted. So we limit the exposure of the data that they can see on a Need To Know basis. On the other hand, they are limited in what they can push back to the cloud. Even with these limitations, there is a lot of freedom in the system, because once you have this defined, you can write your application on the cloud side and on the laptop and just let RavenDB handle the synchronization between them. The doctor doesn’t need access to a network to be able to work, since they have a RavenDB instance running locally and the cloud instance will sync up once there is any connectivity.

We are left with one issue, though. Note that the doctor can get the patients’ files, but is unable to push updates to them. How is that going to work?

The reason that the doctor is unable to write to the patients’ files is that they are not trusted. Instead, they will send a visit record, which contains their finding and recommendation and on the cloud, we’ll validate the data, merge it with the actual patient’s record, apply any business rules and then update the record. Once that is done, it will show up in the doctor’s machine magically. In other words, this setup is meant for untrusted input.

There are more details that we can get into, but I hope that this outline the concepts clearly. This is not a RavenDB 5.0 feature, but will be part of the next RavenDB release, due around September.

FUTURE POSTS

  1. Looking at Parler specs and their architecture - 7 hours from now

There are posts all the way to Jan 21, 2021

RECENT SERIES

  1. Webinar recording (12):
    15 Jan 2021 - Filtered Replication in RavenDB
  2. Production postmortem (30):
    07 Jan 2021 - The file system limitation
  3. Open Source & Money (2):
    19 Nov 2020 - Part II
  4. re (27):
    27 Oct 2020 - Investigating query performance issue in RavenDB
  5. Reminder (10):
    25 Oct 2020 - Online RavenDB In Action Workshop tomorrow via NDC
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats