Assuming that the laws of physics no longer apply, we can build this

Jun 08 2012

Assuming that the laws of physics no longer apply, we can build this

time to read 6 min | 1101 words

This is a reply to a post by Frans Bouma, in which he asks for:

…loud vendors should offer simply one VM to me. On that VM I run the websites, store my DB and my files. As it's a virtual machine, how this machine is actually ran on physical hardware (e.g. partitioned), I don't care, as that's the problem for the cloud vendor to solve. If I need more resources, e.g. I have more traffic to my server, way more visitors per day, the VM stretches, like I bought a bigger box. This frees me from the problem which comes with multiple VMs: I don't have any refactoring to do at all: I can simply build my website as if it runs on my local hardware server, upload it to the VM offered by the cloud vendor, install it on the VM and I'm done.

Um… no.

Go ahead and read the whole post, it is interesting. But the underlying premise that is rely on is flawed. It is like starting out with assuming that since TCP/IP contains no built in prohibition to send data faster than light, the cloud providers can and should create networks that can send data faster than light. After all, I can show a clear business case for the reduced ping time, and that is certainly something that can be abstracted from my application.

What aren’t those bozos doing that?

Well, the answer to that is that it just ain’t possible. There are several minor problems along the way. The CAP theorem, to start with, but even if we ignore that aspect of the problem, there are also the fallacies of distributed computing.

According to Frans’ premise, we can have a single VM that can scale up to as many machines as is needed, without any change required to the system. Let us start with Frans’ answer to the actual scope of the problem:

But what about memory replication and other problems?

This environment isn't simple, at least not for the cloud vendor. But it is simple for the customer who wants to run his sites in that cloud: no work needed. No refactoring needed of existing code. Upload it, run it.

Um.. no.

Let us take a look at a few pieces of code, and see what is going to happen to then in Frans’ cloud environment. For example, let us take a look at this:

var tax = 0;
foreach(var item in order.Items)
{
  tax += item.CalculateTax();   
}
order.Tax = tax;

Problem, because of the elasticity of the VM, we actually spread things around so each of the items in the order collection is located in another physical machine. This is, of course, completely transparent to the code. But that means that each loop iteration is actually doing a network call behind the scene.

OR/M users are familiar with this as the SELECT N+1 problem, but in this case, you have a potential problem on every memory access. Network attached memory isn’t new, you can read about it in OS books and it is a nice theoretical idea, but it is just isn’t going to work, because you actually care about the speed of accessing the data.

In fact, we have many algorithms that were changed specifically to be able to take advantage of cache lines, L1 & L2 cache, etc. Because that has a major increase in the system performance, and that is only on a single machine. Trying to imagine a transparent network memory is futile, you actually care about memory access speed, a lot.

But let us talk about another aspect, I want to make have an always incrementing order id number. So I do:

Interlocked.Increment(ref lastOrderId);

All well and good when running on a single machine, but how should the VM make it work when running on multiple machines?

And remember, this call actually translate to a purpose built assembly instruction (XADD or one of its friends). In this case, you need to do this across the network, and touch as many machines as your system currently runs on.

But the whole point here is to allow us to rapidly generate a new number. This has now turned into a total mess in terms of performance.

What about parallel computing, for that matter?

var results = new Result[items.Length];
Parallel.For(items, (item, i) => 
{
    results[i] = item.Calculate();
});

I have many items, and I want to be able to compute the result in parallel, so I run this fairly standard code. But we are actually going to execute this on multiple threads, so this get scheduled on several different machines. But now you have to copy the results buffer to all of those machines, as well as any related state that they have, then copy it back out when it is done, then somehow merge the different changes made by different systems into a coherent whole.

Good luck with that.

I could go on, but I think that you get the point by now.

And we haven’t talked about the error condition yet. What happen if my system is running on 3 machines, and one of them goes down (power outage, hardware failure, etc)? 3rd of my memory, ongoing work and a lot of stuff just got lost. For that matter, I might have (actually, probably have) dangling references to memory that used to be on the failed machines, so the other two systems are likely to hit this inaccessible memory and fail themselves.

So.. no, this idea is a pipe dream, it isn’t going to work, not because of some evil plot by dastardly folks conspiring to make your life harder, but for the simple reason that it is easier to fly by flapping your arms.

0 comments

Tags:

architecture

Oren Eini

Oren Eini

CEO of RavenDB