Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,971 | Comments: 44,508

filter by tags archive

User experience on the main path–get it or get lost


The background for this post:

Recently I got an email from a startup founder about a service that they are offering. It so happened that this service matched something that I was actually considering doing, so I was very happy to try it out. I registered, and on two separate occasions I attempted to use the service for its intended purpose. I wasn’t able to do that. I got some errors, and I’m not sure if it was data validation or plain errors.

After getting in touch with the founder in question, he pointed me to the guide which (presumably) had step by step instructions on how to get things working. I commented that this shouldn’t be this hard and got a reply that this is just the way it is, anywhere, not just with this service. It was at that point that I gave up on the service entirely.

A few things to note, this is a web based service (and I intentionally not revealing which one) that is supposed to allow collaboration in one –> many scenarios. I was trying to use it as a publisher.

This experience jarred me, for a very simple reason. It shouldn’t be this hard. And sending a user that just want to get their feet wet to the guide is a really bad mistake. You can see the funnel on the right, and anyone that is familiar with customer acquisition will recognize the problem.

Basically, the process of getting a new user begins when they are just looking at your software, and you want to lead them in without any jarring. Any hurdle in their path is going to cause some of them to leave, probably never to return. So you want to make damn sure that you have as little friction to the process as you can.

In this case, I was trying to use a service for the purpose it was intended to. And I was just bogged down with a lot of details that had to be completed before I could even test out the service.

In days of old, games used to come with a guide that told you about how to actually play the game. I remember going over the Red Alert and Diablo guides with great excitement while the games were installing.

Then the games makers noticed that no one was reading those guides, and new gamers run into problems playing the games even though they were clearly documented in the guide.

The solution was to stop using guides. Instead, the games started incorporating several initial levels as tutorial, to make sure that gamers actually learned all about the game while playing the game.

It is a great way to reduce the friction of playing, and it ensured a smooth transition from no idea how to play the game to ready to spend a lot of time blowing pixels up.

Taking this back to the realm of non gaming software, you really have to identify a few core paths that users are likely to walk down into when using your software, and then you are going to stream line pretty much everything along their path to make sure that they don’t hit any friction.

It means asking the user to do as little work as possible, choosing defaults for them (until they care enough to change those), having dedicated UI to lead them to the “Wow, this is amazing!” moments. Only after you actually gained enough positive experience with the users can you actually require them to do some work.

And note that by do some work, I’m talking about anything from setting up the logo for the publisher to selecting what categories the data will go on. By work, I’m talking about anything that is holding up the user from doing the major thing that they are on their site for.

If you don’t do that, then you are going to end up with narrower funnel, and the end result is that you’ll have fewer users. Not because you service is inferior, or your software sucks. Simply because you failed to prove to the user that you are actually worth the time investment in give you a fair shoot.

API DesignWe’ll let the users sort it out


In my previous post, I explained an API design that give the user the option to perform an immediate operation, use the default fire and forget or use an explicit bulk mechanism. The idea is that most operations are small, and that the cost of actually going over the network is going to dominate the cost of the entire operation. In this case, we want to give the user the option of selecting the “I want to know the operation completed” or “I just want to try the best, I’m fine if there is a failure” modes.

Eli asks:

Trying to understand why this isn't just a scripting case. In your example you loop over CSV and propose an increment call per part which gets batched up and committed outside the user's control. Why not define a JavaScript on the server that takes an array ("a batch") of parts sent over the wire and iterates over it on the server instead? This way the user gets fine grained control over the batch. I'm guessing the answer has something to do with your distributed counter architecture...

If I understand Eli properly, the idea is that we’ll just expose an API like this:

Increment(“users/1/visits”, 1);

And provide an endpoint where a user can POST a JavaScript code that will call this. The idea is that the user will be able to decide whatever to call this once, or send a whole batch of updates in one go.

This is certainly an option, but in my considered opinion, it is a pretty bad one. It has nothing to do with the distributed architecture, it has to do with the burden we put on the user. The actual semantics of “go all the way to the server and confirm the operation” vs “let us do a bulk insert kind of thing” are pretty easy. Each of them has a pretty well defined behavior.

But what happens when you want to do an operation per page view? From the point of view of your code, you are making a single operation (incrementing the counters for a particular entity). From the point of view of the system as a whole, you are generating a whole lot of individual requests that would be much better off as a single bulk request.

Having a scripting endpoint gives the user the option of doing that, sure, but then they need to handle:

  • Error recovery
  • Multi threading
  • Flushing on batch size / time
  • Failover

And many more that I’m probably forgetting. By providing the users with the option of making an informed choice about speed vs. safety, we avoid putting the onus of the actual implementation on them.

API DesignSmall modifications over a network


In RavenDB 4.0 (yes, that is quite a bit away), we are working on additional data types and storage engines. One of the things that we’ll add, for example, is the notion of gossiping distributed counters. That doesn’t actually matter for our purposes here, however.

What I wanted to talk about today is the problem in making small updates over a network. Consider the counters example. By far the most common operations are some variant of:

counters.Increment(name, 1);

Since the database is remote, we need to send that over the network. But that is a very small operation. Going all the way to the remote database just for that is a big waste of time. Consider that in most systems, we don’t have just a single thread of execution, we have many. Each of them performing their own operations. Allowing each operation to go on its own is a big waste, but what other options can we offer?

There are other considerations as well. While most of the time we’ll be making small operations, it is very common to need to do bulk operations as well. Maybe you are setting up the system for the first time, or importing data from a file, etc. Making large number of individual requests will kill any hope of doing this fast. The obvious answer is to use batching. send a lot of values to the server all at once. This reduce the network overhead significantly.

The API for that would be something like:

using(var batch = counters.Advanced.NewBatch())
{
	foreach(var line in File.EnumerateAllLines("counters.csv"))
	{
		var parts = line.Split();
		batch.Increment(parts[0], long.Parse(line[1]));
	}
}

So that is great for big things, but not very helpful for individual operations. We still have to go to the server for each of those.

Of course, we could also use the batch API, but making use of that for an individual operation is… a big waste.

What to do?

The end result we arrived at was that we have three levels of API:

  • counters.Increment(name, 1); – single operation, executed immediately over the network. Guaranteed to succeed or fail and give you the result.
  • counters.Advanced.NewBatch() – batch operation, executed over all of the items in the batch (but not as a single transaction), will let you know if the whole operation succeeded, or if there was an issue with something.
  • counters.Batch.Increment() – the default batch, thread safe, can be utilized by individual requests. This is a fire & forget operation. We increment, and behind the scene we’ll merge all the increment from all the threads and send them in batches to the server.

Note that the last option means that we’ll only do batches, so only when enough time has lapsed or we have enough items to send will we send the data to the server. The idea is that you get to choose, based on the importance of what you are doing.

If you need confirmation that something was successful, use the individual operation. If you just want us to make best effort, and if something bad really happened you don’t care about it, use the batch option.

Note that I’m using the counters example here because it is simple, but the same applies for things like time series, and other stuff that we are currently building.

The business process of comparing the price of milk


A law recently came into effect in Israel that mandated all supermarket chains to publicize their full pricing data and keep it up to date. The idea is that doing so will allow users to easily compare prices and lower costs. That has led to some interesting complaints in the media.

“Evil corporations are publishing the data in an unreadable format instead of an accessible files”.

I was really amused by the unreadable format (pretty horrible XML, to tell you the truth) and the accessible files (Excel spreadsheets) definitions. But expecting the media to actually get something right is… flawed, probably.

At any rate, one of the guys in the office played around with the data, toying with the notion of creating a small app for that purpose. (Side node, there are currently at least three major efforts to have a price comparison app/site for this. One of them is by a client of us, but beside for some modeling advice, we have no relation to this).

So we sat down and talked about the kind of features that you will want from this kind of app. The first option, of course, is to define a cart of the products I want to buy, and then compare this cart on various chains and branches. (For each chain, there is a different price list, and prices are also different on a per branch basis).

So far, that is easy, but that isn’t really what you want to do. The next obvious steps would be to take into account sales and promotions. The easiest ones are sales such as 1+1 or 2+1.That require an optimizing engine that can suggest getting another bottle of coke to the two you already got (same price), or recommending the purchase of 10 packages of diapers (non perishable product, currently on significant sale).

But this is really just the start. Sales and promotions are complex, a typical condition for a sale would be a whole chicken for 1 NIS (if you purchase more than 150 NIS total). Or maybe you get complementary products, or… you get the point. You need to take into account all the various discounts you can get.

Speaking of discounts, that means that you now need to also consider not just static stuff, but also dynamic (per user). Maybe the user has a set of coupons that they can use, or maybe they have a club membership in a particular chain that has certain benefits (for example 3% discount on all the chain’s branded products and another 5% for any 5 other products preselected at the time you joined the club).

So you now need a recommendation engine that can run those kind of projections and make suggestions based on them.

Oh, and wait! We also need to consider substitutions. If I purchased Bamba (a common peanut butter snack in Israel, and one of the only common uses of peanut butter in Israel), which is shown on the left, I might actually want to get Shush, which is pretty much the same thing, only slightly less costly. Or maybe get Parpar, which is another replacement, which is even cheaper.

To make it easier to the people not well versed with Israeli snacks, that means that we want to offer RC Cola or Pepsi Cola instead of Coca Cola, because they are cheaper.

Of course, some people swear that the taste is totally different, so they can’t be replaced. But does anyone really care what brand of cleaning product they use to clean the floor? The answer, by the way, is yes, people really do care about that kind of stuff. But even when you have a specific brand, maybe you can get the 2 gallon bottle of cleaning solution instead of the 1 gallon, because it end up being cheaper.

This is just to outline the complexity inherit in this. To do this well, you need to do quite a lot. This is actually really interesting from a technical perspective.  But this isn’t a post about the technical side.

Instead, let us talk about money. In particular, how to make money here. You can assume that selling the app wouldn’t work, not even a freemium model. But an app that guide people on what store to buy? There are quite a lot of ways to make money here.

We’ll start with an interesting observation. If you are a gateway to Joe making money, that usually mean that that Joe is going to pay you. Something that is very obvious is club membership referral fees.

If I run your cart through my engine, and I can tell you “Dude, if you join chain Xyz club you can save 100NIS today and 15,000 NIS this year, just click on this button to join”, that is a pretty compelling argument. And the chain would pay a certain fee for everyone we registered for them. Another way to make money is to get the aggregated data and do stuff with it. Coming up with “chain Xyz is the cheapest” so they can use it on their ads is something that would be worth money.

Another way to do this is to run interference. Instead of going to the supermarket, we’ll just make that order for you, and it will be on your door in a few hours… and the app will make sure to always get it from the cheapest location.

There are also holidays, and they typically have big purchases for the dinners, etc. That means that we can build a cart for the holidays, check it with the various chains, then suggest buying them at chain Xyz. Of course, that was pre-negotiated so they would get the big sales (presumably we are ethical and make sure that that chain is really the cheapest).

Of course, all of this make an interesting assumption. That the chains care about the price you buy from them. They don’t really care about that, they care about their margins. That means that we can play on that. We can suggest a cheaper cart overall, but one that would be more profitable to the chain because the products that we suggest have lower price for the customer, but higher margin for the chain.

And we totally left out the notion of coupons and discounts for actually using the app. “Scan this screen to get a 7% discount for Coca Cola”, for example. This way, the people paying us aren’t just the chains, but the manufacturers as well.

And coming back to the technical side of things. Consider targeted discounts. If I know what kind of products you want to buy, we can get all the way to customer specific price listing. (I’ll sell you diapers at cost, and you’ll buy the store brand of toilet paper at the list price, instead of the usually discounted price, so the store profit is higher).

Nitpicker corner: I have zero knowledge of retail beyond that is need to actually purchase food. I have no idea if those options are valid, and this is purely a mental exercise. This is interesting because of that. There are some technical problems here (in the recommendations), but a lot more business problems (in the kind of partnerships and deals that you can make for the customers).

I didn’t spend a lot of time considering this, and I’m pretty sure that I missed quite a few options. But it is a good way to remember that most business problems have depth behind them, not just technical solutions.

The state of a failure condition


I’m looking over of a bunch of distributed algorithm discussion groups, and I recently saw several people making the same bad assumption. The issue is that in a distributed system, you have to assume that any communication between system can fail.

Because that is taken into account in any distributed algorithm, there is a school of thought that believe that errors shouldn’t generate replies. That is horrifying to me.

Let me give a concrete example. In the Raft algorithm, nodes will participate in an election in order to decide who is the leader. A node can decide to vote for a certain candidate, to reject a candidate or it may be down and not responsive. Since we have to handle the non responsive node anyway, it is easy to assume that we only need to reply to the candidate when we actually vote for it. After all, no reply is a negative reply already, no?

The issue with this design decision is that this is indeed correct, but it is also boneheaded*. There are two reasons here. The minor one is that a non reply will force us to wait until a pre-configured timeout happen, after which we can go into failure handling. But actually sending a reply when we know that we refuse to vote for a node can give that node more information, and cut down the time it takes for the node to respond to negative replies.

As important as that is, this isn’t really my main concern. My main concern here is that not sending a reply leaves the administrator trying to figure out what is going on with essentially zero data. On the other hand, if the node send a “you are missing X,Y and Z for me to consider you applicable”, that is something that can be traced, that can be shown and acted upon.

It may seem like a small thing, overall, but it is something with crucial importance for operations. Those are hard enough when you have a single node. When you have a distributed system, you have to plan for that explicitly.

* I am using this terminology intentionally. Anyone who don’t consider production support and monitoring for their software from the get go never had to support complex production systems, where every nugget of information can be crucial.

That ain’t going to take you anywhere


As part of our usual work routine, we field customer questions and inquiries. A pretty common one is to take a look at their system to make sure that they are making a good use of RavenDB.

Usually, this involves going over the code with the team, and making some specific recommendations. Merge those indexes, re-model this bit to allow for this widget to operate more cleanly, etc.

Recently we had such a review in which what I ended up saying is: “Buy a bigger server, hope this work, and rewrite this from scratch as fast as possible”.

The really annoying thing is that someone who was quite talented has obviously spent a lot of time doing a lot of really complex things to end up where they are now. It strongly reminded me of this image:

image

At this point, you can have the best horse in the world, but the only thing that will happen if it runs is that you are going to be messed up.

What was so bad? Well, to start with, the application was designed to work with a dynamic data model. That is probably also why RavenDB was selected, since that is a great choice for dynamic data.

Then the designers sat down and created the following system of classes:

public class Table
{
	public Guid TableId {get;set;}
	public List<FieldInformation> Fields {get;set;}
	public List<Reference> References {get;set;}
	public List<Constraint> Constraints {get;set;}
}

public class FieldInformation
{
	public Guid FieldId {get;set;}
	public string Name {get;set;}
	public string Type {get;set;}
	public bool Required {get;set;}
}

public class Reference
{
	public Guid ReferenceId {get;set;}
	public string Field {get;set;}
	public Guid ReferencedTableId {get;set;}
}

public class Instance
{
	public Guid InstanceId {get;set;}
	public Guid TableId {get;set;}
	public List<Guid> References {get;set;}
	public List<FieldValue> Values {get;set;}
}

public class FieldValue
{
	public Guid FieldId {get;set;}
	public string Value {get;set;}
}

I’ll let you draw your own conclusions about how the documents looked like, or just how many calls you needed to load a single entity instance.

For that matter, it wasn’t possible to query such a system directly, obviously, so they created a set of multi-map/reduce indexes that took this data and translated that into something resembling a real entity, then queried that.

But the number of documents, indexes and the sheer travesty going on meant that actually:

  • Saving something to RavenDB took a long time.
  • Querying was really complex.
  • The number of indexes was high
  • Just figuring out what is going on in the system was nigh impossible without a map, a guide and a lot of good luck.

Just to cap things off, this is a .NET project, and in order to connect to RavenDB they used direct REST calls using HttpClient. Blithely ignoring all the man-decades that were spent in creating a good client side experience and integration. For example, they made no use of Etags or Not-Modified-Since, so a lot of the things that RavenDB can do (even under such… hardship) to make things better weren’t supported, because the client code won’t cooperate.

I don’t generally say things like “throw this all away”, but there is no mid or long term approach that could possibly work here.

Linux, Debts and Out Of Memory Killer


Imagine that you go to the bank, and ask for a 100,000$ mortgage. The nice guy in the bank agrees to lend you the money,  and since you need to pay that in 5 installments, you take 15,000$ to the contractor, and leave the rest in the bank until it is needed. The bank is doing brisk business, and promise a lot of customers that they can get their mortgage in the bank. Since most of the mortgages are also taken in installments, the bank never actually have enough money to hand over to all lenders. But it make do.

Until one dark day when you come to the bank and ask for the rest of the money, because it is time to install the kitchen cabinets, and you need to pay for that. The nice man in the bank tell you to wait a bit, and goes to see if they have any money. At this point, it would be embarrassing to tell you that they don’t have any money to give you, because they over committed themselves. The nice man from the bank murders you and bury your body in the desert, to avoid you complaining that you didn’t get the money that you were promised.  Actually, the nice man might go ahead and kill someone else (robbing them in the process), and then give you their money. You go home happy to your blood stained kitchen cabinets.

That is how memory management works in Linux.

After this dramatic opening, let us get down to what is really going on. Linux has a major problem. Its process model means that it is stuck up a tree and the only way down is via free fall. Whenever a process wants to create another process, the standard method in Linux is to call fork() and then call execv() to execute the new binary. The problem here is what fork() does. It needs to copy the entire process state to the new process. That include all memory, handles, registers, etc.

Let us assume that we have a process that allocated 1GB of memory for reading and writing, and then called fork(). The way things are setup, it is pretty cheap to create the new process, all we need to do is duplicate the kernel data structures and we are done. However, what happens when the memory that the process allocated? The fork() call requires that both processes will have access to that memory, and also that both of them may modify it. That means that we have a copy on write situation. Whenever one of the processes modify the memory, it is forcing the OS to copy that piece of memory to another physical memory location and remap the virtual addresses.

This allows the developer to do some really cool stuff. Redis implemented its backup strategy via the fork() call. By forking and then dumping the in memory process state to disk it can get consistent snapshot of the system with almost no code. It is the OS that is responsible for maintaining that invariant.

Speaking of invariants, it also means that there is absolutely no way that Linux can manage memory properly. If we have 2 GB of RAM on the machine, and we have a 1GB process that fork()-ed, what is going to happen? Well, it was promised 1 GB of RAM, and it got that. And it was also promised by fork() that both processes will be able to modify the full 1GB of RAM. If we also have some other processes taking memory (and assuming no swap for the moment), that pretty much means that someone is going to end up holding the dirty end of the stick.

Now, Linux has a configuration option that would prevent it (vm.overcommit_memory = 2, and the over commit ratio, but that isn’t really important. I’m including this here for the nitpickers, and yes, I’m aware that you can set oom_adj = –17 to protect myself from this issue, not the point.). This tell Linux that it shouldn’t over commit. In such cases, it would mean that the fork() method call would fail, and you’ll be left with an effectively a crippled system. So, we have the potential for a broken invariant. What is going to happen now?

Well, Linux promised you memory, and after exhausting all of the physical memory, it will start paging to swap file. But that can be exhausted to. That is when the Out Of Memory Killer gets to play, and it takes an axe and start choosing a likely candidate to be mercilessly murdered. The “nice” thing about this is that there is no control over that, and you might be a perfectly well behaved process that the OOM just doesn’t like this Monday, so buh-bye!

Looking around, it seems that we aren’t the only one that had run head first into this issue. The Oracle recommendation is to set things up to panic and reboot the entire machine when this happens, and that seems… unproductive.

The problem is that as a database, we aren’t really in control of how much we allocate, and we rely on the system to tell us when we do too much. Linux has no facility to do things like warn applications that memory is low, or even letting us know that by refusing to allocate more memory. Both are things that we already support, and would be very helpful.

That is quite annoying.

.NET Packaging mess


In the past few years, we had:

  • .NET Full
  • .NET Micro
  • .NET Client Profile
  • .NET Silverlight
  • .NET Portable Class Library
  • .NET WinRT
  • Core CLR
  • Core CLR (Cloud Optimized)*
  • MessingWithYa CLR

* Can’t care enough to figure out if this is the same as the previous one or not.

In each of those cases, they offered similar, but not identical API and options. That is completely ignoring the versioning side of things ,where we have .NET 2.0 (1.0 finally died a while ago), .NET 3.5, .NET 4.0 and .NET 4.5. I don’t think that something can be done about versioning, but the packaging issue is painful.

Here is a small example why:

image

In each case, we need to subtly tweak the system to accommodate the new packaging option. This is pure additional cost to the system, with zero net benefit. Each time that we have to do that, we add a whole new dimension to the testing and support matrix, leaving aside the fact that the complexity of the solution is increasing.

I wouldn’t mind it so much, if it weren’t for the fact that a lot of those are effectively drive-bys, it feels. Silverlight took a lot of effort, and it is dead. WinRT took a lot of effort, and it is effectively dead.

This adds a real cost in time and effort, and it is hurting the platform as a whole.

Now users are running into issues with the Core CLR not supporting stuff that we use. So we need to rip out MEF from some of our code, and implement it ourselves just to get things in the same place as before.

Excerpts from the RavenDB Performance team reportExpensive headers, and cache effects


This ended up being a pretty obvious, in retrospect. We noticed in the profiler that we spent a lot of time working with headers. Now, RavenDB is using REST as the communication layer, so it is doing a lot with that, but we should be able to do better.

Then Tal dug into the actual implementation and found:

public string GetHeader(string key)
{
	if (InnerHeaders.Contains(key) == false)
		return null;
	return InnerHeaders.GetValues(key).FirstOrDefault();
}

public List<string> GetHeaders(string key)
{
	if (InnerHeaders.Contains(key) == false)
		return null;
	return InnerHeaders.GetValues(key).ToList();
}


public HttpHeaders InnerHeaders
{
	get
	{
		var headers = new Headers();
		foreach (var header in InnerRequest.Headers)
		{
			if (header.Value.Count() == 1)
				headers.Add(header.Key, header.Value.First());
			else
				headers.Add(header.Key, header.Value.ToList());
		}

		if (InnerRequest.Content == null)
			return headers;

		foreach (var header in InnerRequest.Content.Headers)
		{
			if (header.Value.Count() == 1)
				headers.Add(header.Key, header.Value.First());
			else
				headers.Add(header.Key, header.Value.ToList());
		}

		return headers;
	}
}

To be fair, this implementation was created very early on, and no one ever actually spent any time looking it since (why would they? it worked, and quite well). The problem is the number of copies that we have, and the fact that to pull a since header, we have to copy all the headers, sometimes multiple times. We replaced this with code that wasn’t doing stupid stuff, and we couldn’t even find the cost of working with headers in the profiler any longer.

But that brings up a really interesting question. How could we not know about this sort of thing? I mean, this isn’t the first time that we are doing a performance pass on the system. So how come we missed this?

The answer is that in this performance pass, we are doing something different. Usually we perf-test RavenDB as you would when using it on your own systems. But for the purpose of this suite of tests, and in order to find more stuff that we can optimize, we are actually working with a stripped down client, no caching, no attempt to optimize things across the entire board. In fact, we have put RavenDB in the worst possible situation, all new work, and no chance to do any sort of optimizations, then we start seeing how all of those code paths that were rarely hit started to light up quite nicely.

Gossip much? Use cases and bad practices for gossip protocols


My previous few posts has talked about specific algorithms for gossip protocols, specifically: HyParView and Plumtrees. They dealt with the technical behavior of the system, the process in which we are sending data over the cluster to all the nodes. In this post, I want to talk a bit about what kind of messages we are going to send in such a system.

The obvious one is to try to keep the entire state of the system up to date using gossip. So whenever we make a change, we gossip about it to the entire network, and we are able to get to an eventually consistent system in which all nodes have roughly the same data. There is one problem with that, you now have a lot of nodes with the same data on them. At some point, that stop making sense. Usually gossip is used when you have a large group of servers, and just keep all the data on all the nodes is not a good idea unless your data set is very small.

So you don’t do that.  Gossip is usually used to disseminate a small data set, one that can fit comfortably inside a single machine (usually it is a very small data set, a few hundred MB at most). Let us consider a few types of messages that would fit in a gossip setting.

The obvious example is the actual topology of the whole network. A node joining up the cluster will announce its presence, and that will percolate to the entire cluster, eventually. That can allow you to have an idea (note, this isn’t a certainty) about what is the structure of the cluster, and maybe make decisions based on it.

The system wide configuration data is also a good candidate for gossip, for example, you can use gossip as a distributed service locator in the cluster. Whenever a new SMTP server comes online, it announces itself via gossip to the cluster. It is added to the list of SMTP servers in all the nodes that heard about it, and then it get used. In this kind of system, you have to take into account that servers can be down for a long period of time, and miss up on messages. Gossip does not guarantee that the messages will arrive, after all. Oh, it is going to do its best, but you need to also build an anti entropy system. If a server finds that it missed up on too much, it can request one of its peers to send it a full snapshot of the current global state as that peer know it.

Going in the same vein, nodes can gossip about the health state of the network. If I’m trying to send an email via an SMTP server, and it is down, I’m going to try another server, and let the network know that I’ve failed to talk to that particular server. If enough nodes fail to communicate with the server, that become part of the state of the system, so nodes that learned about it can avoid that server for a period of time.

Moving into a different direction, you can also do gossip queries, that can be done by sending a gossip message on the cluster with a specific query to it. A typical example might be “which node has a free 10GB that I can use?”. Such queries typically carry with them a timeout element. You send the query, and any matches are sent back to (either directly or also via gossip). After a predefined timeout, you can assume that you got all the replies that you are going to get, so you can operate on that. More interesting is when you want to query for the actual data held in each node. If we want to find all the users who logged in today, for example.

The problem with doing something like that is that you might have a large result set, and you’ll need to have some way to work with that. You don’t want to send it all to a single destination, and what would you do with it, anyway? For that reason, most of the time gossip queries are actually aggregation. We can use that to get an estimate of certain things in our cluster. If we wanted to get the number of users per country, that would be a good candidate for this, for example. Note that you won’t necessarily get accurate results, if you have failures, so there are aggregation methods for getting a good probability of the actual value.

For fun, here is an interesting exercise. Look at trending topics in a large number of conversations. In this case, whenever you would get a new message, you would analyze the topics for this message, and periodically (every second, let us say), you’ll gossip to your peers about this. In this case, we don’t just blindly pass the gossip between nodes. Instead, we’ll use a slightly different method. Each second, every node will contact its peers to send them the current trending topics in the node. Each time the trending topics change, a version number is incremented. In addition, the node also send its peer the node ids and versions of the messages it got from other nodes. The peer, in reply, will send a confirmation about all the node ids and versions that it has. So the origin node can fill in about any new information that it go, or ask to get updates for information that it doesn’t have.

This reduce the number of updates that flow throughout the cluster, while still maintain an eventually consistent model. We’ll be able to tell, from each node, what are the current trending topics globally.

FUTURE POSTS

  1. Paying the rent online - 2 days from now

There are posts all the way to Aug 03, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats