Monday, May 16, 2011

Compute and Data Clouds

I've discussed several times about the need for public and private clouds. Unfortunately it's still possible to find people and vendors who believe that only public clouds are "true" clouds, whatever that means. From a purely technical perspective I am disappointed because the scientist in me always tries to take an objective stance. Factor in the business side and I can understand why they make these statements. It doesn't make it right in my book though.

However, I want to move the debate on because I think some stances are unlikely to change any time soon. So I began to think of the real reasons behind cloud: what brought us to where we are today, both social as well as technical. In some ways this is related to the posting I'll make later on the vision behind JBossEverywhere, but even that goes back even further to the root of cloud: grid computing and ubiquitous computing.

I was involved in some of the grid efforts in the early 2000's, particularly around the area of transactions and, independently, Web Services. In fact it was several of my friends and colleagues, several of whom were working for me at Arjuna, who wrote a seminal paper on the subject that really did stir up a hornets nest at the time.

However, I digress. If you look back at the grid work at the time, it had really coalesced into two different, though related, use cases: compute grids, where the networked resources worked in parallel on a (typically) small set of data that was farmed out to them by some central server(s), and the data grid, where the resources cooperate on processing a (typically) large data set that may have taken hours or days to download to them and which they may all share. The software and architecture of both types of grids could be markedly different for very good reasons, including fault tolerance and security.

Well I think those same reasons need to be applied to the cloud. Why would someone go to the expense of setting up a private cloud, for instance, versus a public clouds? Why would you use a hybrid approach? Ultimately I believe it is because of the data, and have said so before: the cloud will go to the data and not the other way around. But why? For the same reasons as the grid. If you have lots of data that would take hours or days to upload to a cloud, then you probably don't want to go the public route as failures (e.g., the AWS outage) could cost you valuable time and money, not forgetting the headache of ensuring that that much data can remain secure! But if you can split the data into logical quanta that can be processed relatively independently, then you could gain from the public cloud, even if just as an "overflow" mechanism when your private cloud runs out of capacity, as securing smaller amounts of data can be easier to accomplish and a failure should be more self contained.

Hmmm, I wonder if there's also a suitable analogy to make here with traditional ACID transactions and long running (extended) transactions?

Anyway, I believe that the public, private or hybrid debates should move on and we need to be talking about compute and data clouds. At that level the choice of *where* to host the work becomes more obvious and should be taken out of the hands of subjective vendor debates.

No comments: