Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 622 words

imageThis post is the conclusion for this series (unless I’ll get some interesting questions). So far, I outlined how to break apart the system, the data flow and data processing inside it, a lot about the internal constraints and the business logic as it is applied. There hasn’t been a lot of code, because I wanted to keep things at the architecture level rather do low level dive.

Early in the series, I got a comment from Rafal that is pretty profound:

There's a general agreement among software creators and their customers that software replaces papers and going paperless is A Good Thing ™. And then after introducing an IT solution everyone starts complaining that the papers were so much better to work with and allowed for much greater flexibility. Especially for order handling workflow, where you could print copies of the order, hand it out to proper people and be sure they have everything they need to do the job. And you could always put some additional info on the papers when there was a need for special handling.

The rigidity of computer systems often means that we have to work around the system in the real world to get things done. In many cases, that actively hurt the people using the system. For example, if I got an inmate that had a specific constraint (danger to himself, isolated from a particular group, etc), I can take a red marker and write in big letters on the file the message, ensuring that everyone that deals with the file is aware of it. If this is not explicitly called out in the design of the system, there is really no good way to do that with a computer system. And that can be a great deterrent for adopting the system and its usage.

What is worse, if you have such a requirement, it will often show up as something like this:

image

A mandatory, annoying (and never read) message box that isn’t actually useful for the purpose.

One of the rules that we have as system architects is explicitly anticipating and answering this kind of situations and providing something that can do at least as good as plain old paper.

The design on Macto as outlined in this series of posts attempted to do just that. To continue Rafal’s quote:

And your approach is the same idea applied to software design - make a digital piece of paper that almost physically follows the process, is always there and has everything necessary to do the work, then pass it around and just make sure it's not lost somewhere in between. No central registry, no central decision about where the papers go, just do your task and pass the message to the next station.

Doing something in the UI like getting the user the ability to inject some elements is trivial, after all, if the data format can handle it. So you have a way to record the information the user want and display it in a way that make sense to them, without having to know more about UI design than Right Click > Add (Field / Note / Heading / Timer), etc. At the same time, you gain all the benefits of a computerized system (backups, search, recall, etc), the ability to avoid signing things in triplicate, have access to the entire status of the system at once, etc.

This is not a trivial thing to do, but it can result in quite a different to the system and its adoption.

time to read 5 min | 942 words

imageIn this series of blog posts, I have talked a lot about the way data flows, the notion of always talking to a local server and the difference between each location’s own data and the shared data that comes from the other parts of the system.

Here is how it looks like when modeling things:

Snapshot

To simplify things, we have the notion of the shared database (which is how I typically get data from the rest of the system) and my database. Data gets into the shared database using replication, which is using a gossip protocol, is resilient to network errors and can route around them, etc. The application will only ever write data to its own database, never directly to the shared one. ETL Processes will write the data from the application database to the local copy of the shared database, and from there it will be sent across to the other parties.

In terms of input/output, the process of writing locally to app DB, ETL process to local shared DB, automatic dissemination of data to the rest of the world is quite simple, once you have finished the setup. It means that you don’t really have to think about the way you publish information, but can still do that in such a way that you are not constrained in the development of the application (no shared database blues here, thank you!).  However, that only deals with the outgoing side of things, how are we going to handle incoming data?

We need to remember that a core part of the design is that we aren’t just blindly copying data from the shared database. Even though this is trusted, we still need to process the data and reconcile it with what we have in our own database.

A good example of that might be the release inmate workflow that we already discussed. This is initiated by the Registration Office, and it is sent to all the parties in the prison. Let’s see how a particular block is going to handle the processing of such a core scenario.

The actual workflow for releasing an inmate needs to be handled by many parties. From the block’s perspective, this means getting the inmate physically into the release party and handing over responsibility for that inmate. When the workflow document for the inmate release reaches the block’s shared database, we need to start the internal process inside the block to handle that. We can use RavenDB Subscriptions for this purpose. A subscription is a persistent query, and any time a match is found on the subscription query, we’ll get the matching documents and can operate on that.

Here is what the subscription looks like:

image

Basically, it says “gimme all the release workflows for block B”. The idea of a persistent query is that whenever a new document arrives, if it matches the query, we’ll send it to the process that has this subscription opened. This means that we have a typical latency of a few milliseconds before we process the document in the worker process.

Now, let’s consider what we’ll need to do whenever we get a new release workflow. This can look like this:

I’m obviously skipping stuff here, but you should get the gist (pun intended) of what is going on.

There are a couple of interesting things in here. First, you can see that I’m writing the code here in Python. I could have also used Ruby, node.js, etc.

The idea is that this is an internal ingestion pipeline for a particular workflow. Independent of any other thing that happens in the system. Basically, the idea is to have a Open To Extension, Close To Modification kind of system.  Integration with the outside world is done through subscriptions that filter the data that we care about and integration scripts that operate over the stream of data. I’m using a Python script in this manner because it is easy to show how fluid this can be. I could have use a compiled application using C# or Java just as easily. But the idea in this architecture is that it is possible and easy to modify and manage things on the fly.

The subscription workers ingesting the documents from the subscriptions take the data from the shared database, process and validate it and then make the decisions on what should be done further. On any batch of workflow documents for releasing inmates, we’ll alert the sergeant (either way, we need to release the inmate or we need to figure out why the warrant is on us while the inmate is not in our hands).

More complex script may check all the release workflows, in case the block that the Registration Office thinks the inmate is on is out of date, for example. We can also use these scripts to glue in additional players (preparing the release party to take the inmate, scheduling this in advanced, etc), but we might want to do that in our main app instead of in the scripts, to make it more visible what is going on.

The underlying architecture and shape of the application is quite explicit on the notion of data transfer, though, so it might be a good idea to do it in the scripts. A lot of that depends on whatever this is shared functionality, something that is customized per block, etc.

time to read 5 min | 821 words

image

For a long time, whenever I was talking to customers about the business domain, I would explicitly avoid using the term “business logic”. Primarily because I never found such things to be logical in any way shape or form. A lot of the business decisions and policies are driven by a host of legacy reasons, “this is how everyone does it” and behaviors that has became accepted and then entrenched.

Take what is supposed to be a pretty simple rule. Given a warrant, when should an inmate be released? On the face of it, that seems like a pretty obvious and straightforward answer, right?

Depending on the type of warrant (can be for 48 hours, 5 days, 30 months, 10 years, life) the answer is quite different. For example, if someone was arrested at 3 PM on Thursday on a 48 hours hold, he must be released on Saturday at 3 PM. But that is actually a problem, because the prison does not release inmates on Saturday. So the release date is moved back (or forward(!), depending on a lot of stuff).

If an inmate is sentenced for life, that might mean that he is expected to die in prison, be released in 10 years, 25 years or be eligible to go on parole in 14 years and be effectively free. I did some research around sentencing rules around the world and I must say that this is confusing and quite sad. Even within a single legal system the amount of complexity, precedent, custom and variance is staggering.

At any rate, we need to figure out something that seems to be quite simple. Given an inmate and the warrants we have on file, what should be the release date. This can be as simple as having a single warrant, or a series of sequential warrants (arrest, held until trial, sentencing, etc). That is simple and pretty obvious. Go to the latest warrant, get the duration from that and then start computing the release date. Here we have another problem, from what date do we start counting? If the inmate has been under arrest for the entire duration, then we start from that point. If the inmate has been free (bail, etc), then we start from the point he got put back into prison. Sometimes the inmate was held for a while (several months, before getting bail, for example), so that will be counted against the sentencing period (or not, depending on a bunch of stuff). In short, being told “you are hereby sentenced to 10 years” can mean several different release dates, even assuming nothing changes.

So this is complex, and hard, and in many cases very much situational dependent. How do you approach handling this?

To start with, this is one of those cases that you can, should and require to get a specification, complete with examples, test suite and samples, etc. It may sound silly, because all we are doing is computing a date, but the implications are… important, especially for the people who are being held.

The specification will have a few straightforward cases, but also a lot of convoluted mess that with luck, you can get a lawyer to decipher, but most likely not. The way to handle that is to recognize the patterns that you know that you can reliably figure out and provide answers to those. If it was up to me, I’ll be producing a long hand report, like so:

Snapshot

Note that this computation sheet is not the final say, instead, the Registration Office officer is going to sign on that, after having validated the dates independently.

For patterns that aren’t so easy to compute, a good way to handle that is to show the information you have and not give any answer, making the officer that will sign up on the correctness of this result do all the work.

As an aside, printing this report, including the computation and how it was arrived is a really good idea because it can be handed to the inmate as well as to their attorney. At this point, presumably they’ll double check the dates as well. This is important since a mistake in releasing an inmate that didn’t happen yet is free. After all, if an inmate is supposed to walk out on 2022 but we computed the sentence until 2025 and it was discovered in 2018, no harm was done (except maybe to someone’s nerves).

The human in the loop model is quite important in this regard, because of the notion of the single responsibility that I previously mentioned. Someone, a person, in actually responsible for the release date computation, and that should probably be a human that isn’t the system developer from a decade ago.

time to read 7 min | 1388 words

imageThe inmate population in any typical prison is divided according to many and varying rules. It can be the length of expected stay, it can be the level of security required, the kind of facilities required, the crimes committed, etc.

For simplicity, we’ll talk about Block A (minimum security, good behavior, low risk) and Block C (bitter lifers, bad apples, violent inmates, etc) as our two examples. These differences can create  very different environments. Things that would never pass muster in block A and routine in block C and vice versa.

A good example would be the acceptance criteria, in order to be accepted to block A you have to meet certain standards (non violent offenders or 8 years inside with no spots on your records or strong recommendation from an officer). In order to be sent to block C you need to be in a particular kind of trouble (violent crime, recent behavioral issues, high risk from intelligence, etc).

Being written up by a guard in block A will result in loss of privileges like not being sent to work, reduction in visitation, etc. You don’t get written up in block C, you get sent to disciplinary action with the block’s officer and can be confined to the cell, isolation, lose cafeteria privileges, etc.

From the outside, both of these blocks are roughly the same, but from the inside, they have very different populations, behavior and accepted practices.

This means that when we need to write a system that would serve both blocks (and there is also Block B, Isolation and the Medical Ward as well, all slightly different) we are in somewhat of a pickle. How do we account for all of these differences. One way to handle that would be to just deal with the common stuff (the counts, the legal dealing, etc) and let each block dictate policy outside of the system. We can also provide some “note keeping” functionality, such as the ability to assign tasks, keep notes and records on inmates and hope that the blocks would use that so we’ll at least have a record of all these policy decisions.

Alternatively, we can map what each block wants to do and customize the application for each of them. The problem here is that this things change, and when talking about a large enough basis, they change quite often. Given a typical tenure of a block’s officer of about 3 – 5 years (really depend on the type of prison, in some cases, you’ll have tenures as short as a year or two) and the tendency of each new  officer to want to make some changes (otherwise, why are they there?) and the fact that in a typical prison you’ll have 3 – 6 blocks and about 10 high level officers that each want to leave their mark (each with independent tenures), you end up with a fairly constant rate of low level changes.

If this make you cringe at the expected number of changes that will be required to always adapt the system, I hear you, that isn’t a fun place to be in.

There are typically two major ways to handle this. Either you’ll ensure that no such changes are accepted, by not making the changes and having the prison work around the different practices while still using the system or you plan to adapt things from the get go. The first option is very common in a top down organization, where the HQ wants to “lay down the law” about how things “should be done”. The other option is typically more expensive, sometimes ridiculously so, depending on how far you want to push it.

Dynamic data, forms and behaviors, oh my! Let the prison guard completely re-design the system in his free time. To be fair, I was a prison guard and I would enjoy that, but I haven’t found many people in my current career that can say that they have prison experience (from either side of the bars). In practical terms, I would say that the technical level of prison guards is at or below the population norm and not at a level sufficient to actually do anything mission critical such as dealing with people’s freedom.

It is actually usually quite easy to convince the HQ people to avoid any flexibility in the system. They like ensuring that things are done “right”, even if that is quite different from how things are actually working (or even can possibly work). But we’ll avoid such power plays. Instead, let’s talk about how we can limit the scope of the work that is required and still gain enough flexibility for most things.

With RavenDB, defining dynamic data is both easy and obvious, so that is easy. Each block can define additional fields that they can tack onto documents and just have them there. The auto indexing features will also ensure that searches on such fields are fast and efficient. I’m not going to touch on any UI elements, that is someone else’s job Smile.

Let us talk about policy decisions. For example, we might need to decide whatever an inmate is acceptable or not for a block. That means that we need to have some way to decide policy. Now, I have literally written a book about building DSLs for just such a scenario. You can very quickly build something simple an elegant that would give the user the chance to define their own policy and behavior.

In fact, given the target audience, this is not a good idea. We don’t expect the prison guard to make such decisions, so we don’t need to cater to them. Instead, we’ll cater to developers, probably the same developers who are in charge of actually building and maintaining the system. This give us a very different flavor to deal with. For example, instead of building a DSL, we can just use a programming language.

For example, we can use JavaScript to shell out at critical parts of the pipeline. A good example would be at the validation stage of processing an incoming inmate. We’ll pass the inmate document to a JavaScript function and that can emit validation warnings and actions that are supposed to take place. Here is a small sample:


The real world would probably have several pages of various business logic around what should and shouldn’t happen here. Including things like assigning to specific cell because of the inmate’s affiliation, etc. The idea here is that we’ll give the developers an easy way to go and modify the behavior of the system for each location this is deployed in.

As an aside, this kind of things needs to be logged and audited. That means that you can store these scripts in something like a git repository and record the commit hash for the version you are using when you are making decisions. In 99.9% of the cases it will not matter, but if you’ll need to show to court why the “computer told us” that a certain inmate had to be dealt with in a certain way, you want to be able to know what happened and produce the right script that help made that decision.

You might also note from the script that the output of the function is a set of warnings, not errors or exceptions. Why is that? Because there is an explicit place here for the human element. That means that if we have warnings for an inmate, we can still actually accept the inmate, despite the warnings. We might require the sergeant to note why the inmate is accepted despite the warnings (and answers may be things such as “they run out of room in B” and “he was overheard saying he would stab someone”). This is because quite explicitly, we don’t treat the system as the source of truth.

This system is the system of record, it holds the information about what is going on, but it isn’t meant to be rigid, it has to be flexible, because we are dealing with people and there is no way that we can cover all situations. So we try to ensure that there is a place for the human element throughout the system design.

time to read 4 min | 652 words

imageAn interesting problem in distributed work shows up quite frequently in the prison space. Duplicated, delayed and repeated warrants (um… I mean packets). This can lead to some issues, especially since the receiving parties may or may not be in communication with one another.

Consider the case of a warrant to release a particular inmate. It is quite common for the process of actually getting the warrant back to the prison from the court to take a few hours. In that time frame, the inmate’s lawyer has already arrived at the gates of the prison and handed the release warrant to the block’s sergeant (it’s a bit more complex than that, but I’m skipping details here because I want to talk about the technical details of building a system to run a prison rather than the actual details of running a prison).

At this point, the block’s sergeant can pass the warrant off to the Registration Office, which will also accept the warrant from the court at some time in the future or they can initiate the release process directly. What actually happens depends on a lot of factors, but let’s say that they start the release process directly. We already talked about what that would mean, so let’s focus on another aspect of that. How do we deal with the arrival of the warrant to the Registration Office when there is already an open workflow to release the inmate.

For fun, here is an brief example of a warrant:

image

Yes, this is faxed, often unreadable and sometimes coffee stained. There is no such thing as a warrant id that you can use to check if the warrant has already been seen. There is supposed to be, at least per court / judge, but there often just isn’t.

Side note regarding the issue of faxing warrants. Yes, there have been cases where people just sent a release warrant and people got out. Part of the process of actually processing a release warrant is talking with the court to validate it, but that isn’t always done.

Another fun fact is that what one warrant may do another warrant may undo. So you may have a release warrant on hand, but the court has already issued a stay of 48 hours for that warrant so the police can appeal that, for example. If the second warrant doesn’t arrive in time…

At any rate, the fact that a warrant may show up multiple times and that there may be conflicting warrants being processed at the same time means that there is the need to handle some level of de-duplication. We can usually detect that using the inmate’s details and the date in which the warrant was issued (it is rare that multiple warrants for the same person are issued at the same date, so that is enough to flag things).  If the result of two warrants on the same date is the same, we can assume that they are the same.

If there are conflicts, this will raise a flag and require a human involvement to resolve it. A conflict will be raised for any non identical warrants for the same day for the same inmate, because any such activity is suspicious and require additional attention.

Following the Single Responsibility Principle as applied to prison (there must be a single responsible party so we can put them in jail if they mess this up), the validation of warrants is at the hands of the Registration Office and they take care of handling all such warrants. Even if the warrant was served directly to the block’s sergeant, the final validation (and responsibility) is on the Registration Office personal actually signing on the release form.

time to read 6 min | 1025 words

imageThe whole point of having a prison is ensuring that the inmates are staying in. As such, the process of actually getting an inmate out of the prison is quite involved. I’m using the release workflow as an obvious example, but there are many such workflows inside the prison that are similar. The inmate’s intake workflow, or disciplinary workflows or even just inmate’s transfers which can be quite complex (and typically also happen in big batches).

In the previous post in this series, I talked briefly about the idea of workflows as an event publication with multiple signatories to the workflow operation. If this make no sense, let me try to explain.

The trigger for starting an inmate’s release can be the acceptance of a warrant for immediate release, the expiration of the warrant to hold the inmate. There are several other variants (transfer to another facility, death of inmate, escape of inmate, etc) that are related but are not too relevant for the discussion, so I’ll just talk about the immediate release and the warrant expiration.

For warrant expiration, the workflow starts from the Registration Office publishing the list of inmates that are supposed to be released today. In technical terms, they create a series of workflow documents that are in the Open state and this gets disseminated to the rest of the system. Immediate release warrant is usually served to the Registration Office, but may also be served directly to the block’s sergeant.

If the Registration Office got the immediate release warrant, things are identical as the usual scheduled release (except that this might be served at any time). However, if this is served to the block’s sergeant, things are more interesting. At this point, it is the block that will initiate the workflow, but the overall responsibility for verifying the warrant and ensuring the actual release is still on the Registration Office.

As an aside, the notion of "who owns this” is quite important in the prison. Mostly because if you mess up, there might be consequences. This can be holding an inmate past the due date (bad, can result in damages paid to the inmate and career consequences to the person who messed up) to releasing an inmate that wasn’t supposed to be released (really bad, sometimes it is not possible / feasible to hold them again, requires court approval, may end up in jail time for the person who messed up and also result in possible dangerous inmate being released). So the idea is that at any time, there is an owner for this process and a clear finger pointing at that person, “You’re to blame”.

Because of this, there are typically multiple steps in the release process. Consider the simple scenario of a scheduled release, we have:

  • Registering the inmate on the “to be released” list.
  • Notifying external and internal parties about this inmate’s pending release. For example, an inmate might have to go through a parole officer before actual release. This is done by sending the “to be released” list several days in advanced and getting at least an implicit agreement (by not vetoing / modifying the process) by external parties.
  • Verification that there are no explicit holds on the inmate. In the case of an inmate that is supposed to be deported, the inmate’s file will have an explicit “deport on release” which typically require coordinating with the border police to handle that. So the inmate can’t just be shoved out the door but handed off to someone.
  • Identification of the inmate at the block level. This is typically done on the sergeant’s level and then by an officer (preferably from the same block) that are familiar with the inmate and can validate that this is indeed the one to be released.
  • Checking out the inmate from the block level to the prison’s level. This explicitly remove the inmate from the block’s responsibility once the inmate has been handled off to the release queue.
  • Identification of the inmate by the Registration Office’s officer. This is a second verification that is done to ensure that there hasn’t been a mix up again.
  • Verification of the warrant to release the inmate and that there are no newer warrants that are in effect.
  • Returning of personal affects and getting the inmate’s signature that everything was properly returned.
  • Checking the inmate out of the prison, this step explicitly ends the period in which the now ex-inmate is held in prison.
  • Actually getting the newly released ex-inmate out of the prison. This can be to a family member at the gate or to a bus to the nearest city, etc.

I’m probably forgetting a few details in the middle, and there are branches for each and every one of these steps.

In terms of the technical details of how this works. The workflow document is being distributed throughout the system, and then various parties are now in charge of actually completing the various tasks in the workflow. For example, this may mean checking out the inmate’s personal affects from the safe in the morning, preparing for the release. So the steps aren’t necessarily in sequence or ordered.

The important thing is that the workflow document is published, and is now tracked. Typically, the release process is tracked by the Registration Office and the Command & Control Center. The Command & Control Center will typically be involved early in the process if an inmate isn’t released by the typical time, and the close of day process for the Registration Office includes verification that there aren’t any still pending inmate releases.

At any given point, we can track the process of the workflow from any point in the system (remember, changes are made locally and then distributed to the rest of the system). In there is a communication issue between different parts of the prison, that will typically show up as an alert that a particular workflow hasn’t been properly completed in the allotted time. At this point, external channel (walking & talking, usually) is used to verify the status of this particular inmate release.

time to read 5 min | 857 words

imageThe design of Macto (that is this prison management application I’m discussing) so far seems pretty strange. Some of the constraints are common, the desire for resiliency and being able to have each independent portion of the system work in isolation. However, the way we go about handling this is strange.

I looked up the definition of a micro services architecture and I got to Martin Fowler, saying:

..the microservice architectural style [1] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.

This wouldn’t work for our prison system, because we are required to continue normal operations when we can’t communicate with the other parts of the system. A prison block may not be able to talk to the Registration Office, but that doesn’t mean that it can error. Operations must continue normally.

So RPC over HTTPS is out, but what about messaging? We can use a queue to dispatch commands / events to the other parties in the system, no?

The answer is that we can do that, but let’s consider how that will work. For events, that would work quite nicely, and have roughly the same model that we have now. In fact, with the RavenDB ETL process and the dissemination protocol in place, that is pretty much what we have. Changes to my local state are pushed via ETL to a shared known location and then distributed to the rest of the system.

But what about commands? A command in the system can be things like “we got a warrant for releasing an inmate”. That is something that must proceed successfully. If there is a communication breakdown, generating an error message is not acceptable, in this case, someone will hand deliver the warrant to the block and get this inmate out of the prison.

In other words, we have the notion of commands flowing in the system, but the same command can also come from the Registration Office or from a phone call or from a lawyer physically showing up at the block and shoving a bunch of papers at the sergeant on duty demanding an immediate release.

All of this leads me to create an data architecture that is based on local data interaction with a backbone to share it, while relying on external channels to route around issues between parts of the system. Let’s consider the case of a releasing an inmate. The normal way it works is that the Registration Office prepare a list of inmates that needs to be released today.

Here is one such document, which is created on the Registration Office server.

image

This then flow via ETL to the rest of the prison. Each block will get the list of inmates that they need to process for release and the Command & Control Center is in charge that they are processed and released properly. I’ll have a separate post to talk about such workflows, because they are interesting, but the key here is that we don’t actually have a command being sent.

Instead, we create a document starting the workflow for release. This is exactly the same as an event being published and it will be received by the other parties in the prison. At this point, each party is going to do their own thing. For example, the Command & Control Center will verify that the bus to the nearest town is ordered and each block’s sergeant is in charge of getting the inmates processed out of the block and into the hand of the Registration Office, which will handle things such as returning personal effects, second verification that the right inmate is released, etc.

Unlike a command, which typically have a success / error code, we use this notion as well as timed alerts to verify that things happen. In other words, the Command & Control Center will be alerted if the inmate isn’t out of the prison by 10:00 AM and will take steps to check why that has happened. The Registration Office will also do its own checks at the close of the day, to ensure that no inmates are still in the prison when they shouldn’t.

Note that this involves multiple parties cooperating with each other, but none of them explicitly rely on request / response style of communication.

This is message passing, and also event publication, but I find that this is a much more passive manner in which you’ll work. Instead of explicitly interacting with the outside world, you are always operating on your own data, and additional work and tasks to do shows up as it is replicated from the other parts of the system.

Another benefit of this approach is that it also ensures that there are multiple and independent verification steps for most processes, which is a good way to avoid making mistakes with people’s freedom.

time to read 4 min | 669 words

image

In the previous post, I talked about the flow of data in the system and introduced the notion of sharing data via explicit ETL processes. The ETL process is an explicit contract of each part of the system that shares some of the data to the outside world. By the way, in the context of the prison system, the outside world is a bit misleading, it might be more accurate to say that we share the data with the other parties inside the prison. Nothing here is actually available outside the prison itself (integration with other systems, such as the Department of Justice, other prisons, etc is an even more complex topic that I don’t know if I’ll be covering. For now, assume that the method of integration if a Fax machine).

Here is the simplest topology that we can build using this approach.

Snapshot

The Registration Office has an ETL process that outputs the data it wants to share with a dedicated database, specially for this purpose. From this public database, we setup replication to dedicated database instances on each of the blocks.

In other words, when the application inside the block wants to access some shared data, it isn’t going to reach over the network and hit the public registration database but use a local instance that the public registration database is replicating to.

Why do we have this behavior? Because we don’t trust the network and must always rely on our own local resources. When the network is available, we’ll get continuous updates via replication, and when it isn’t available, we’ll have the latest information that we possibly could and can still act on that (I’ll have a separate post talking about workflow processes that mitigate eventual consistency and concurrency handling in such a system).

This means that the application can always continue running, regardless of the state of the outside world.

Another option, which is somewhat simpler, is to not have a public registration database, but a database that is there to share the data explicitly between all the different parties, this will look like this:

Snapshot

In this case, each party’s setup includes both the internal data that is required to running the block / office / department in question and a shared database that is used to hold all the data that is shared by all the parties in the prison.

Note that the topology of the shared data is a full mesh. In other words, data that is sent to the shared database from the Registration Office using RavenDB ETL will be sent to all the other parties, this is the same as we had before. However, because we now have a shared database, if the Registration Office cannot talk to Block B, that block will still get all the updates from Block A, including the updates that originated from the Registration Office. This is because RavenDB replication uses the gossip model and can bridge such gaps in network without issue.

This might be a simpler model, because the process of each party publishing information for the consumption of the rest of the prison is simplified, we simply define an ETL process to the shared database and the data will be distributed far and wide, made available to anyone that wants it.

This has the advantage that most of the details of managing services can be deferred to RavenDB. You need to make sure that your ETL processes are contractual, that is, they don’t change the shape or meaning of the data, and that is about it. All data access from the application is made to the local database and there is little need to worry about integration between the various parties, error handling of remote calls, etc.

time to read 8 min | 1444 words

imageIn our prison system, we have a lot of independent parts, by design. Each of them is expected to work independently of the rest of the system, but also cooperate with them. That typically require a particular mindset when designing the application.

Let’s lay out the different aspects of integration between the various pieces, shall we?

  1. Each part of the system should be able to run completely independently from the other pieces.
  2. There are offline methods for communications (literally a guy walking down to the block with a piece of paper) that can be a backup communication plane but can also by pass the system entirely (a warrant being served directly to the block’s officers).
  3. There are two distinct options for communication: Commands (release this inmate, ensure inmate is ready to go to court at date) and notifications (inmates count, status, etc)
  4. We trust, but verify, each piece of information that we receive.

The first point seems like a pretty standard requirement, but in combination of the second point, we get into a particularly interesting issue. We may have the same information entered into the system by multiple parties at different times.

For example, a warrant to release an inmate might be served directly to the block’s officer. The release is processed, and then the warrant arrives to the Registration office, which will also enter it. At some later time, this data is merged and we need to ensure that it make sense to the people reading it.

The offline communication plane is also a very important design considerations for a system that reflects the real world. It means that we don’t have to provide too complex an infrastructure for surviving production. In fact, given the fact that a prison is going to hardly have a top notch technical operations team (they might have a top notch operations team, but they refer to something quite different), we don’t want to build something that rely on good communications.

To make sense of such a system, we need to define data ownership and data flow between the various systems. Because this is a really complex system, we’ll take a few example and analyze them properly.

  • The legal status of an inmate.
  • The location of an inmate.

What is the meaning of legal status? It means under what warrant it is in the prison (held until trial, 48 hours hold, got a final judgement). At its simplest, it is what date should this person be released. But in practical terms, this can be much more complex and may have conditions on where this inmate can be held, what they can do, etc.

Everything about the legal status of an inmate is the responsibility of the Registration Office. Any movements of inmates into or out of the prison must go through the Registration Office. Actually, this isn’t quite true. Any movement of an inmate from the responsibility of the prison must go through the Registration Office. But the physical location of the inmate is the responsibility of the block into which the inmate was assigned.

A good example of this would be an inmate that has been hospitalized. They are not physically inside the prison, but the prison is still responsible for them. The Registration Office doesn’t usually care for such details (but sometimes they do, for example, if the inmate has a court date that they’ll miss, they need to notify the court) because there isn’t a change in who is in charge of the inmate.

This is complex, but this is also the real world, and we need to manage this complexity. So, let’s define the ownership rules and data flow behavior:

  • Legal Status is owned by Registration Office and is being disseminated from there to all interested parties.
  • The location of an inmate and its current physical status are owned by the blocked it is assigned to and disseminated from there to all interested parties.
  • The assignment of an inmate to a particular block is also interesting. This piece of information is owned by the Registration Office, but it is not their (sole) decision. This may take a bit of explaining.

The block an inmate is assigned to is determined by a bunch of stuff. What is the legal status of the inmate, what is the previous / expected behavior of this inmate, whatever the inmate needs to be isolated from / together with certain people, information / recommendation from the intelligence office, court decisions, the free space available on each block, the inmate medical status and many other details that are not quite important.

The Registration Office will usually make the initial placement of where an inmate is going to go, but this is not their decision, there is a workflow involved that has input from way too many parties. The official decision is at the hands of the prison commander, but recording this decision (and the data ownership of it) is at the hands of the Registration Office.

Okay, enough domain knowledge, let’s talk about the technical details, shall we? I’m sorry that I have to do such an info dump, and I’m trying to contain it to relevant pieces, but if I don’t include the semantics of what we are doing, it will make very little sense or be extremely artificial.

The legal status of inmates in the Registration Office needs to be sent to other parties in the prison. In particular, all the blocks and the Command & Control Center.

We can deal with this by defining the following RavenDB ETL process from the Registration Office:

image

What this does is simply define the data that we’ll share to the outside world. If I was building this for real, this will probably be a lot bigger, because an inmate is a really complex topic. What is important here is that we define this as an explicit process. In other words, this is part of the service contract that we have with the outside world. The shape of the data and the way we publish it may only be changed if we contact all parties involved. Practically speaking, this usually means that we can only add data, but never remove any fields from the system.

Note that in this case, we simplify the model as we send it out. The warrants for this inmate aren’t going out, and we just pull the latest status and release dates from the most up to date warrant. This is a good example of how we avoid exposing our internal state to the outside world, giving us the flexibility to change things later.

The question now is where does this data goes to? RavenDB ETL will write the data to an external database, and here we have a few options. First, we can define an ETL target for each of the known parties that want this data (each of the blocks and the Command & Control Center, at this time). But while that would work, it isn’t such a great idea. We’ll have to duplicate the ETL definition for each of those.

A better option is to send the (transformed) data to a dedicated database that will be our integration source. Consider the following example:

image

In this case, we can have this dedicated public database that exposes all the data that the Registration Office shares with the rest of the world. Any part that wants this information can setup external replication from this database to their own. In this manner, when the Intelligence Office decides to make it known that they also needs to access the inmate registration data, we can just add them as a replication destination for this database.

Another option is not to have each individual party in the prison share its own status, but have a single shared database that each of them write to. This can look like this:

image

In this case, any party that wants to share data will be writing it to the shared database, and anyone who reads it will have access to it through replication from there. This way, we define a data pipeline of all the shared data in the prison that anyone can hook up to.

This post is getting long enough that I’ll separate the discussion of the actual topology of the data and handling the incoming data to separate posts, stay tuned.

time to read 5 min | 806 words

imageWhen you need to write a form, one of the first things that pops to mind is what kind of validations are required. A good example would be the new user registration form:

Snapshot

This is such an ingrained instinct that we typically seek to stop such invalid states as soon as possible. If the business rules says that you cannot have a certain state, it is easiest to just ensure that you can never get into such a state.

If you are working in a system that exists purely within the scope of your application, that might actually be useful property to have. This is certainly something that you want in your code, because it means that you can make assumptions on invariants.

If you have a system that reflects the real world, however, you need to take into account several really annoying things:

  • Your model of the real world is not accurate, and will never be accurate.
  • The real world changes, sometimes quite rapidly and in a very annoying fashion.
  • Sometimes the invariant is real, but you gotta violate it anyway.
  • If the system doesn’t let the users do their job, they will do the job in spite of your system. That will lead to a system of workarounds. Eventually, you’ll have to support these workarounds. This may result in hair loss and significant amount of aggravation.

Consider the case that the prison in question is a minimum security prison, expected to have white collar inmates that have already been tried. Now you get an inmate that is accused of murder and is currently on trial. (Side note, prisons have a very different structure for people who still undergoing trial, because they aren’t convicted yet and because the amount of pressure that they are under is very high. They’ll typically be held in different facilities and treated differently from inmates that have already been convicted and tried).

What do you think will happen if the system refuses to accept such an inmate into the prison? Well, the guy is right there, and the prison commander has already authorized letting him in. So telling him that he should go back home because they “computer won’t let you in” is not going to be a workable solution.

Instead, we use a very different model for validation and acceptance of data. Instead of stopping the bad input as soon as possible, we raise a flag about it during processing, but we do allow the user to proceed, assuming they explicitly authorize this. It will look like this:

Snapshot

At the same time, we flag such records and require additional review of the data.

In most cases, by the way, the fact that the inmate is in residence is not something that can be ignored and will be in all reports on the state of the prison until the state changes (transferred, verdict given, etc).

This kind of thinking is important, because it means that the software intrinsically does not trust the data, but continue runs validation on it. This is a good practice to be in when dealing with systems that reflect the real world.

This does lead to an interesting question, where do we run these validations, and when?

The ground rules for a good application is that it is like an amoeba, it may have multiple locations that it accepts input, but that is the only way to push data in, through well defined channels.

imageThis is another way of saying that we don’t allow someone else to go and poke around in our database behind our back.

Any time that we accept new data, regardless of the channel it arrives in, we run it through the validation pipeline. And this validation pipeline can add (or mark as obsolete) validation issues that should be brought to the attention of the people in charge.

Note that these sort of validations are almost always very much business rules, not type issues. If someone’s birthday is in the future, you can feel free to very easily reject that data as purely invalid. But if someone’s release date is in the past, however, you might still need to accept them as an inmate (the paperwork are really in the mail, sometimes).

I think I’ll need another post just to talk about how to implement these validation rules and behaviors, but that is still down the line. The next post topic is going to be data flow between the different systems that we talked about so far.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}