Showing posts with label replication. Show all posts
Showing posts with label replication. Show all posts

Sunday, December 17, 2017

Multicasts and Replication


In the previous entries we have discussed what delivery properties can be provided by the communication subsystem, and how they can be implemented. We shall now discuss how such communications primitives can be used in a replication scheme to aid in maintaining replica consistency.

Group invocations can be implemented as replicated RPCs by replacing the one-to-one communication of send_request and send_result in the figure below (which we saw in an earlier entry too) with one-to-many communication. 



Every non-faulty member of a client group sends the request message to every non-faulty member of the server group, which in turn send a reply back. If multiple client groups invoke methods on the same replicated server group then it must be ensured that concurrent invocations are executed in an identical order at all of the correctly functioning replicas, otherwise the states of the replicas may diverge. In order to ensure this property, the objects must not only be deterministic in nature, but all correctly functioning replicas must receive the same sets of messages in the same order i.e., a totally ordered multicast must be employed.

The State Machine conditions Cl and C2 can both be met by making use of totally ordered multicasts to deliver all messages transmitted by clients and servers. However, such total ordering of messages may be unnecessary for all interactions: if two, non-conflicting, non-related messages are received at members of the same replica group (e.g., two unrelated electronic mail messages from different users) then they need not be ordered consistently at these destinations. If they were related in some manner (e.g., from the same user) then they could be ordered consistently. Application level ordering can be achieved more efficiently as cheaper, reliable broadcast protocols can be used to deliver messages for which ordering is unimportant, resorting to the more complex order preserving protocols only where necessary. Since such protocols typically need more rounds of messages to be transmitted, the reduction in their use can be beneficial to the system as a whole, whilst maintaining overall replica consistency.

One method of achieving such application level ordering would be to transmit messages using unordered atomic multicasts (since it is still important that the messages are received by all functioning replicas) which only guarantee delivery to all functioning replicas but make no guarantee of the order (described in a previous entry), and then to impose ordering on top of this i.e., at a level above the communication layer. If atomic actions are used in the system then we can make use of their properties to impose the ordering on message execution that we require i.e., the order shall be equivalent to the serialization order imposed by atomic actions. This has the advantage that operations from different clients which do not conflict (e.g., multiple read operations) can be executed in a different order at each replica. Atomic actions will ensure that multiple accesses from different clients to the same resource will be allowed only if such interaction is serialisable.

Monday, April 06, 2015

Microservices and state

In a previous entry I was talking about the natural unit of failure for a microservice(s) being the container (Docker or some other implementation). I touched briefly on state but only to say that we should assume stateless instances for the purpose of that article and we'd come back to state later. Well it's later and I've a few thoughts on the topic. First it's worth noting that if the container is the unit of failure, such that all servers within an image fail together, then it can also be the unit of replication. Again let's ignore durable state for now but it still makes sense to spin up multiple instances of the same image to handle load and provide fault tolerance (increased availability). In fact this is how cluster technologies such as Kubernetes work.

I've spent a lot of time over the years working in the areas of fault tolerance, replication and transactions; it doesn't matter whether we're talking about replicating objects, services, or containers, the principles are the same. This got me to thinking that something I wrote over 22 years ago might have some applicability today. Back then we were looking at distributed objects and strongly consistent replication using transactions. Work on weakly consistent replication protocols was in its infancy and despite the fact that today everyone seems to be offering them in one form or another and trying to use them within their applications, their general applicability is not as wide as you might believe; and pushing the problem of resolving replica inconsistencies up to the application developer isn't the best thing to do! However, once again this is getting off topic a bit and perhaps something else I'll come back to in a different article. For now let's assume if there are replicas then their states will be in sync (that doesn't require transactions, but they're a good approach).

In order to support a wide range of replication strategies, ranging from primary-copy (passive) through to available copies (active), we created a system whereby the object methods (code) was replicated separately from the object state. In this way the binaries representing the code was immutable and when activated they'd read the state from elsewhere; in fact because the methods could be replicated to different degress from the state, it was possible for multiple replicated methods (servers) to read their state from an unreplicated object state (store). I won't spoil the plot too much so the interested reader can take a look at the original material. However, I will add that there was a mechanism for naming and locating these various server and state replicas; we also investigated how you could place the replicas (dynamically) on various machines to obtain a desired level of availability, since availability isn't necessarily proportional to the number of replicas you've got.

If you're familiar with Kubernetes (other clustering implementations are possible) then hopefully this sounds familiar. There's a strong equivalency between the components and approaches that Kubernetes uses and what we had in 1993; of course other groups and systems are also similar and for good reasons - there are some fundamental requirements that must be met. Let's get back to how this all fits in with microservices, if that wasn't already obvious. As before, I'll talk about Kubernetes but if you're looking at some other implementation it should be possible to do a mental substitution.

Kubernetes assumes that the images it can spin up as replicas are immutable and identical, so it can pull an instance from any repository and place it on any node (machine) without having to worry about inconsistencies between replicas. Docker doesn't prevent you making changes to the state within a specific image but this results in a different image instance. Therefore, if your microservice(s) within an image maintain their state locally (within the image), you would have to ensure that this new image instance was replicated in the repositories that something like Kubernetes has access to when it creates the clusters of your microservices. That's not an impossible task, of course, but it does present some challenges, including how to distribute the updated image amongst the repositories in a timely manner - you wouldn't want a strongly consistent cluster to be created with different versions of the image because that means different states and hence not consistent, and how to ensure that state changes that happen at each Docker instance and result in a new image being created are in lock-step - one of the downsides of active replication is that it assumes determinism for the replica, i.e., given the same start state and the same set of messages in the same order, the same end state will result; not always possible if you have non-deterministic elements in your code, such as the time of day. There are a number of ways in which you can ensure consistency of state, but we're not just talking about the state of your service now, it's also got to include the entire Docker image.

Therefore, overall it can be a lot simpler to factor the binary that implements your algorithms for your microservices (aka the Docker image or 1993 object) from the state and consider the images within which the microservices reside to be immutable; any state changes that do occur must be saved (made durable) "off image" or be lost when the image is passivated, which could be fine for your services of course, if there's no need for state durability. Of course if you're using active replication then you still have to worry about determinism, but we're only considering state here and not the actual entire Docker image binary too. The way in which the states are kept consistent is covered by a range of protocols, which are well documented in the literature. Where the state is actually saved (the state store, object store, or whatever you want to call it) will depend upon your requirements for the microservice. There are the usual suspects, such as RDBMS, file system, NoSQL store, or even highly available (replicated) in memory data stores which have no persistent backup and rely upon the slim chance that a catastrophic failure will occur to wipe out all of the replicas (even persistent stores have a finite probability that they'll fail and you will lose your data). And of course the RDBMS, file system etc. should be replicated or you'll be putting all of your eggs in the same basket!

One final note (for now): so far we've been making the implicit assumption that each Docker image that contains your microservices in a cluster is identical and immutable. What if we relaxed the identical aspect slightly and allowed different implementations of the same service, written by different teams and potentially in different languages? Of course for simplicitly we should assume that these implementations can all read and write the same state (though even that limitation could be relaxed with sufficient thought). Each microservice in an image could be performing the same task, written against the same algorithm, but with the hopes that bugs or inaccuracies produced by one team were not replicated by others, i.e., this is n-versioning programming. Because these Docker images contain microservices that can deal with each other's states, all we have to do is ensure that Kubernetes (for instance) spins up sufficient versions of these heterogeneous images to give us a desired level of availability in the event of coding errors. That shouldn't be too hard to do since it's something research groups were doing back in the 1990's.

Sunday, March 17, 2013

Travelling is a PITA at times!

I spent pretty much all of the week before last in New York and Boston visiting many of our financial services customers. It's a great opportunity to hear what they like, what they don't like and work together to make things better. I'd do this much more often though if it didn't involve flying! It seems that every time I fly I end up getting ill; usually a cold or (man) flu. Unfortunately this time was no different and when I got home at the weekend I could feel something coming. Sure enough, it was a heavy cold and it laid me up for days (I'm still not recovered completely yet).

Then while I'm recovering I remember that I missed QCon London. I vaguely remember many months ago while planning that my trip conflicted with QCon, but it had slipped from my mind until last week. It's a shame, because I love the QCon events. However, what made this one worse was that I appear to have completely missed the fact that Barbara Liskov was presenting! It's been the best part of 20 years since I last saw Barbara, so it would have been good to hear her and try to catch up. Back in the 1980's I visited her group for a while due to the similarities between what they were doing around Argus, replication (both strong and gossip based) and transactions, and of course what we were doing in Arjuna. She was a great host and I remember that visit very fondly. Oh well, maybe in another 20 years we'll get a chance to meet up again!