Over the past year or so I've been doing a lot of thinking about various aspects of cloud. One of the ones I keep coming back to is the issue of multi-tenancy. Many years ago, when I was formulating the categorization of transactional replication strategies, we found that splitting up the object server (methods) from the state and allowing each to be replicated independently, gave the ability to express everything necessary to cover the range of replica consistency protocols from passive through active. Sharing of the executable versus sharing of the state is based on aspects such as determinism of execution (passive is the only option in some cases) and speed of fail-over (active tends to do much better).
So what has this to do with multitenancy? Well I've been thinking about what it means to have a true multitenant application and hence PaaS. Some people seem to suggest that multitenancy is somehow new, complex or the domain of a new breed of engineers and vendors. But if you think about it, we've been using multitenant environments for years. Your operating system is multitenant, with multiple applications resident and often running concurrently, perhaps modifying shared data structures as a result. Even your modest Web server can be considered multitenant. And there are a range of strategies for achieving multitenancy based on a similar server/state split. Therefore, whether you're a SaaS implementer or a PaaS architect, the same factors come into play. And critically for SaaS implementers, if your PaaS doesn't support these things then you're going to have to!
In terms of PaaS, there are really 3 components that can be played with in terms of multitenancy and application and data deployment. These are the VM, the application server (container of business logic) and the data (database, file system etc.) For instance, you could have each tenant in a separate application server but running on the same VM, with data split between different database instances. Or you could have each tenant in the same application server on the same VM, with data in the same database, perhaps split on different tables. (Other options and configurations exist, of course.)
Now all of these various combinations have more or less been around before, as I said earlier. The big problem though has been around ease of use. I've written enough Unix kernel services before, for instance, to know most of the ins-and-outs of sharing state without causing the operating system to crash, and the best way to use pthreads or memory mapped files in a shared environment, but it's not always intuitive or based on well documented processes. This is precisely where a PaaS fits in: it must support the widest range of approaches and in a way that does not require the SaaS developer to have to worry about them. We can learn a lot from what's been done in the past around the intricacies of achieving multitenancy, but we do need to make it far more consumable than has perhaps been the case so far.
One thing I would point out as well, is that with PAAS the applications in practicality need to understand that the environment is virtual. Some attempts to write to IO for instance may require a lot more error checking and retry logic to ensure items are written to a storage layer properly.
ReplyDeleteThe problems are more geared toward an AWS type environment than something like Sales Force. The more distant you are from the actual hardware, the more you need to put in failsafes and understand the impact of how to prevent or ensure data replication, most of the time through duplication across instances or storage mechanisms.
Been dealing with a bit of that lately so its pretty fresh. :)
Yes, I agree.
ReplyDelete