Ayende @ Rahien

It's a girl

Assuming that the laws of physics no longer apply, we can build this

This is a reply to a post by Frans Bouma, in which he asks for:

…loud vendors should offer simply one VM to me. On that VM I run the websites, store my DB and my files. As it's a virtual machine, how this machine is actually ran on physical hardware (e.g. partitioned), I don't care, as that's the problem for the cloud vendor to solve. If I need more resources, e.g. I have more traffic to my server, way more visitors per day, the VM stretches, like I bought a bigger box. This frees me from the problem which comes with multiple VMs: I don't have any refactoring to do at all: I can simply build my website as if it runs on my local hardware server, upload it to the VM offered by the cloud vendor, install it on the VM and I'm done.

Um… no.

Go ahead and read the whole post, it is interesting. But the underlying premise that is rely on is flawed. It is like starting out with assuming that since TCP/IP contains no built in prohibition to send data faster than light, the cloud providers can and should create networks that can send data faster than light. After all, I can show a clear business case for the reduced ping time, and that is certainly something that can be abstracted from my application.

What aren’t those bozos doing that?

Well, the answer to that is that it just ain’t possible. There are several minor problems along the way. The CAP theorem, to start with, but even if we ignore that aspect of the problem, there are also the fallacies of distributed computing.

According to Frans’ premise, we can have a single VM that can scale up to as many machines as is needed, without any change required to the system. Let us start with Frans’ answer to the actual scope of the problem:

But what about memory replication and other problems?

This environment isn't simple, at least not for the cloud vendor. But it is simple for the customer who wants to run his sites in that cloud: no work needed. No refactoring needed of existing code. Upload it, run it.

Um.. no.

Let us take a look at a few pieces of code, and see what is going to happen to then in Frans’ cloud environment. For example, let us take a look at this:

var tax = 0;
foreach(var item in order.Items)
{
  tax += item.CalculateTax();   
}
order.Tax = tax;

Problem, because of the elasticity of the VM, we actually spread things around so each of the items in the order collection is located in another physical machine. This is, of course, completely transparent to the code. But that means that each loop iteration is actually doing a network call behind the scene.

OR/M users are familiar with this as the SELECT N+1 problem, but in this case, you have a potential problem on every memory access. Network attached memory isn’t new, you can read about it in OS books and it is a nice theoretical idea, but it is just isn’t going to work, because you actually care about the speed of accessing the data.

In fact, we have many algorithms that were changed specifically to be able to take advantage of cache lines, L1 & L2 cache, etc. Because that has a major increase in the system performance, and that is only on a single machine. Trying to imagine a transparent network memory is futile, you actually care about memory access speed, a lot.

But let us talk about another aspect, I want to make have an always incrementing order id number. So I do:

Interlocked.Increment(ref lastOrderId);

All well and good when running on a single machine, but how should the VM make it work when running on multiple machines?

And remember, this call actually translate to a purpose built assembly instruction (XADD or one of its friends). In this case, you need to do this across the network, and touch as many machines as your system currently runs on.

But the whole point here is to allow us to rapidly generate a new number. This has now turned into a total mess in terms of performance.

What about parallel computing, for that matter?

var results = new Result[items.Length];
Parallel.For(items, (item, i) => 
{
    results[i] = item.Calculate();
});

I have many items, and I want to be able to compute the result in parallel, so I run this fairly standard code. But we are actually going to execute this on multiple threads, so this get scheduled on several different machines. But now you have to copy the results buffer to all of those machines, as well as any related state that they have, then copy it back out when it is done, then somehow merge the different changes made by different systems into a coherent whole.

Good luck with that.

I could go on, but I think that you get the point by now.

And we haven’t talked about the error condition yet. What happen if my system is running on 3 machines, and one of them goes down (power outage, hardware failure, etc)? 3rd of my memory, ongoing work and a lot of stuff just got lost. For that matter, I might have (actually, probably have) dangling references to memory that used to be on the failed machines, so the other two systems are likely to hit this inaccessible memory and fail themselves.

So.. no, this idea is a pipe dream, it isn’t going to work, not because of some evil plot by dastardly folks conspiring to make your life harder, but for the simple reason that it is easier to fly by flapping your arms.

Comments

Frans Bouma
06/08/2012 11:30 AM by
Frans Bouma

If I have a VM with 2GB of ram, and I need 2 VMs to have 4GB of ram, why can't I have 1 VM with 4GB of ram? Mind you, the core point of my post is at the bottom: the cost of going from 1 VM to 2 VMs to run your site is exceptionally high, after that it's trivial, the first costs however are the biggest. Add to that that most people barely need 3 VMs or more for their websites, why not make it simpler?

Also, if it wouldn't work, memcache and other systems which effectively virtualize 'in-memory state' wouldn't work as well. It's not for every application type, but the vast majority of websites will work: in-memory state won't be massive, but it's there. You need a system which simply makes it replicated across systems.

Pipe dream? Not really. VMWare already can do this: they can move a live VM from one physical machine to another, no downtime. You won't notice.

tobi
06/08/2012 11:39 AM by
tobi

It actually helps to view CPUs and multi-threading as a distributed sytem. Think about how writes to memory need to be sent out as messages to RAM and to other CPUs. This kind of thinking alows you to understand why memory coherency traffic is terrible for performance. It is equivalent to distributed message sending.

Ayende Rahien
06/08/2012 11:53 AM by
Ayende Rahien

Frans, When you are talking about auto sizing of a single machine, that is fairly easy to do. As you noted, VMWare can do that, and I am pretty sure that HyperV can do so as well. There is absolutely no need to change the OS, .NET or any such thing to get it to work. But that isn't what this is about. Consider the case when you are talking about more than a single physical machine that is required. That is where what you are suggesting just falls down.

And as for "people don't need more than 640Kbytes" - most of the people who go to the cloud are actually doing that because they want to be able to consume more resources easily as needed.

Khalid Abuhakmeh
06/08/2012 12:07 PM by
Khalid Abuhakmeh

You know there are probably problems with Frans' idea, but I think the vision isn't completely crazy which I gather to be: "Give me more without making me think about it."

I think the vision is close to the Windows Azure Web Sites feature. The slider says instances, but it could easily say "Power" or "Umph". I need more Umph in my app please. Sure here it is.

Scott Gu has essentially split Azure into two factions: Those who don't want to think about hardware, and those who do.

It definitely is exciting to see what they've turned Azure into; from a second rate cloud offering to a top of the class offering. Excited to try it out once my preview features get approved.

Tim Gebhardt
06/08/2012 12:16 PM by
Tim Gebhardt

The LMAX guys call this "mechanical sympathy". You don't necessarily need to know how gates and transistors work, but you do need to have an appreciation for how the underlying machine works so that you can work with it and play to its strengths.

Frans Bouma
06/08/2012 12:19 PM by
Frans Bouma

@ayende: I know what it takes to communicate with memory directly etc. I've done assembler programming (gfx) for years ;).

The main point is that the situation as it is today forces you to live with the same drawbacks as you'd run into if the hypothetical system I propose is realized: state has to replicated somewhere, so you e.g. store it in a blob / simple DB. As the .NET CLR knows the state it can do that for you, instead of you doing that manually.

Like I said, it's perhaps not for every application, but for websites, it is, as the state is already located in a known location. What matters is that instead of you implementing systems to replicate state (a stateless system still requires you to do so, as at some point you might need state of a previous request) this is done for you. Which solves the high cost of refactoring your application for stateless behavior.

Ayende Rahien
06/08/2012 12:23 PM by
Ayende Rahien

Frans, Where WOULD you replicate something? And what would you replicate? Memory? That would be too expensive. A web site is already a stateless system, most of the state is either in the db, in a cache or in a state service. All of which can be scale up easily. So I don't see why you would event want to do something like this.

nick
06/08/2012 12:26 PM by
nick

I think the idea is fantastic. But unfortunately, everything I have learned has told me not to abstract the network for pretty much the reasons Ayende says.

if you would like a more in-depth discussion then you could attend Udi Dahan's SOA / Distributed systems / CQRS course. It is also available online. This particular topic is discussed in the initial section on fallacies of distributed computing.

+1 for what Tim says too. I am a big fan of the LMAX / Mechanical sympathy crowd and would recommend everyone go and check out their blog.

Let's keep Frans' suggestion as a nice dream for when abstracting the network has no cost, or we can create single machines with more power than than the current internet.

Gene Hughson
06/08/2012 12:53 PM by
Gene Hughson

"This is, of course, completely transparent to the code. But that means that each loop iteration is actually doing a network call behind the scene."

Never forget, that which does something for you is likely also doing something to you. Cede too much control in the name of simplicity and you trip over the rule above.

Frans Bouma
06/08/2012 12:57 PM by
Frans Bouma

@Ayende: I'd replicate state that's kept now in-memory either in cache or in user state and application state. If you already use a state service, you already have outsourced it to a system outside the appdomain, but not everyone does that, as it's often not needed.

What I'm after is that 'to develop for Azure' is no longer needed. that it's difficult to solve, is a given. But today, the developer is the one to solve it, which is IMHO the wrong place as it makes it more expensive to move to the cloud with a website.

It's likely that if you need a 32CPU VM with 64GB memory to run your website, you might need to invest time to make it more efficient than the 'system' will do it for you, e.g. by doing the statelessness assurance yourself. However who will reach that situation? not many websites, even though most website owners think they will.

Ayende Rahien
06/08/2012 01:04 PM by
Ayende Rahien

Frans, It is far cheaper to go with 2 - 4 smaller machine than with one big machine. And all of the things that you are talking about would work only in the context of a very small application setup, stateless web sites. The problem is that there are already very simple scenarios to solve this easily and painlessly.

Steve
06/08/2012 03:05 PM by
Steve

Seems to me the simplest solution to the problem that maintaining state on the web server becomes a problem as you scale is to build a PaaS framework that solves that problem.

By not allowing you to store state on the web server.

Once you've done that, and you start designing your web app in a stateless manner, the scalability problem becomes easy.

wekempf
06/08/2012 03:12 PM by
wekempf

We're struggling to solve the threading problem now that multiple core machines are the norm and are the only real way to increase computer power. Every single abstraction that has been applied to this domain fails to "make it easy" for every usage pattern, and none of them are entirely transparent. Where we come closest is with models that force restrictions that don't exist in most existing applications (such as requiring no shared state).

What you're suggesting is like that, only far far worse. Because now each "cpu" that code can run on can't be relied on. Not only is it much more likely for one of them to fail without all of them failing, but you've got the added complication that it might not actually fail, it's just that other nodes perceive it to fail due to the nature of network communication. All that before we even start to discuss other resources, such as memory, which will have similar issues when distributed.

If this is a problem that can be solved, it won't be solved any time soon. What you're talking about is nothing more than science fiction today, and will probably remain that way in our life times. If the blog post had been posed in a "wouldn't it be cool" fashion it would have been interesting. Posed as a complaint about what exists today it's just silly. It's like complaining about the Xbox/PS3/Wii because what they "should be" giving you is a holodeck experience.

James Manning
06/08/2012 04:30 PM by
James Manning

Following the windows server blog, it's interesting to see how much they're trying to tackle this with what amounts to 'dynamic NUMA' - while the discussion tended to focus around the Hyper-V aspects, the direction seems to be (at least, IMHO) to treat a collection of machines as being managed by a 'single' hypervisor that's aware of both the machines and (more importantly) their interconnections.

As a given VM needs more resources, if it can't be moved to a single available physical machine that meets those demands, then it can look at available machines that have the best (lowest latency most likely, but bandwidth is certainly a factor as well) interconnects. The hypervisor offers up this NUMA (or NUMA-like) physical configuration to the VM, which starts (or continues) to run as a single machine. Importantly, the hypervisor is still managing the resource allocation across the machines (CPU, memory, etc), and can do so (especially when given help and more info from the guest) pretty effectively.

Most workloads in the cloud (for instance, web sites) end up being very parallel to start with (N different HTTP requests don't have to coordinate with each other to be processed, for instance), so for such workloads, the goal of having things appear as one machine seems at least feasible.

Certainly this kind of approach works better for small numbers of physical machines being mapped to, but NUMA's been around a LONG time and works well.

It'll be really interesting to see which 'network/cluster hypervisors' end up working with (or building on) OpenFlow to potentially help improve performance and management with more commodity networking rather than more customized things (like InfiniBand, I'd imagine)

James Manning
06/08/2012 04:54 PM by
James Manning

sorry, forgot to include this link to the blog post I was most referring to:

http://blogs.technet.com/b/windowsserver/archive/2012/04/05/windows-server-8-beta-hyper-v-amp-scale-up-virtual-machines-part-1.aspx

Chris Bednarski
06/10/2012 01:22 AM by
Chris Bednarski

Times are slowly changing ... looks like network latency is going down to the point where good results are achievable by distributing data http://developers.slashdot.org/story/12/05/23/1640201/microsoft-research-introduces-record-beating-minutesort-tech

Marcel Popescu
06/10/2012 07:02 AM by
Marcel Popescu

I must say I agree with Frans in that as customers, we're supposed to ask the impossible from vendors :) Yes, as a non-VM specialist it's hard to see how they could possibly do what Frans wants, but on the other hand it's hard for me to understand how they manage a lot of other things I'm not a specialist in. I would have no idea how to do 1% of what RavenDB does.

What I'm getting to is that this sounds like those four stages of new inventions... the ones that start with "it's impossible" and end with "it was obvious all along". Maybe the next Google will be the one with the elastic VM?

Marcel Popescu
06/10/2012 07:06 AM by
Marcel Popescu

LOL... just after posting that I read the next article, http://ayende.com/blog/153633/why-i-love-resharper

Let me quote:

"I mean, just look at this. This is magic.

More to the point, think about what this means to try to implement something like that. I wouldn’t know where to even start."

:D

Bonder
06/24/2012 04:37 PM by
Bonder

That's the funny thing about the "it's impossible" to "duh, obvious" spectrum. Even when we are on both sides of the fence, it is very challenging indeed to not fall victim to this trap over and over again.

Let's remember that before Marconi transmitted the first radio signal, the naysayers had thousands of years to say "it's impossible." I for one am glad that he didn't rlisten to those voices.

All of the technical hurdles listed above will be regarded by someone else not as problems but as opportunities.

Comments have been closed on this topic.