Friday, December 22, 2017

Available Copies


In the Available Copies [Bernstein 87] replication protocol, a user of a replicated service reads from one replica and writes to all available replicas. Prior to the execution of an action, each client determines how many replicas of the service there are available, and where they are, (this information may be stored in a naming service and is accessed before each atomic action is performed). Whenever a client detects a failure of a replica it must update the naming service (name-server) view of the replicated object by performing a delete operation for the failed copy. All copies of the name-server, if it too is replicated, must be updated atomically.


When a write operation is performed all copies are written to and they must all reply to this request within a specified time (it is assumed that it is always possible to communicate with non-faulty replicas). Locks must be acquired on all of the functioning replicas before the operations can be performed, and if conflicts between clients occur then some replicas will not be locked on behalf of a client, and the client will be informed, at which point the calling action is aborted. Using this locking policy and the serialisability property of the actions within which operations occur, it is possible to ensure that all replicas execute the operations in identical order. 


If all replicas reply to a write operation then the action may continue. However, if only a subset reply the action must ensure that the silent members have in fact failed, If the silent replicas subsequently reply then the action must abort and try again (this is because the states of the replicas may have diverged). However, if the silent copies have actually failed then the action can still commit since all available copies are in a consistent state.

Whenever a new copy is created (or recovers from a failure) it must be brought up-to-date before the name-server is informed of the recovery (before a client can make use of the replica). When this is done the copy can take requests from clients along with the other members of the group. The updating of recovered replicas can be done automatically if an out-of-date replica intercepts/receives a write request from a current transaction, as has been mentioned previously.

Consider the history of events shown in the diagram below, where Tand T2 are different transactions operating on two replica groups whose members are xl, x2 and yl, y2. Assume that T1 and T2 are using a "read—one copy, write—all—available copies" scheme and that there are initially two copies of objects x and y which they both wish to access. The execution of events is as shown, with time increasing down the y—axis.


If we examine the above history, it is clearly not 1SR i.e., neither the serial execution T1;T2 nor T2;T1 are consistent with the above history. Thus, the idea of "read—one copy, write—all—available copies" by itself cannot guarantee 1SR. It is necessary to execute a validation protocol before the transaction can commit to ensure correctness. In Available Copies this takes the form of ensuring that every copy that was accessed is still available at commit time, and every replica that was unavailable is still unavailable, otherwise the action must abort. 



Because of the assumption made by Available Copies that all functional replicas can
always be contacted, this means that this protocol cannot be used in the presence of network partitions. Anode which is partitioned cannot be distinguished from a failed node until it has been reconnected. If the replication protocol assumes that all nodes which are unavailable have failed when in fact some have only been partitioned, inconsistencies can result in the replicas. As such, if partitions can occur then the replication protocol must be sufficiently sophisticated that it can ensure consistent behaviour despite such failures. 

No comments: