Mark Little's WebLog: transactions

Showing posts with label transactions. Show all posts

Sunday, December 17, 2017

Multicasts and Replication

In the previous entries we have discussed what delivery properties can be provided by the communication subsystem, and how they can be implemented. We shall now discuss how such communications primitives can be used in a replication scheme to aid in maintaining replica consistency.

Group invocations can be implemented as replicated RPCs by replacing the one-to-one communication of send_request and send_result in the figure below (which we saw in an earlier entry too) with one-to-many communication.

Every non-faulty member of a client group sends the request message to every non-faulty member of the server group, which in turn send a reply back. If multiple client groups invoke methods on the same replicated server group then it must be ensured that concurrent invocations are executed in an identical order at all of the correctly functioning replicas, otherwise the states of the replicas may diverge. In order to ensure this property, the objects must not only be deterministic in nature, but all correctly functioning replicas must receive the same sets of messages in the same order i.e., a totally ordered multicast must be employed.

The State Machine conditions Cl and C2 can both be met by making use of totally ordered multicasts to deliver all messages transmitted by clients and servers. However, such total ordering of messages may be unnecessary for all interactions: if two, non-conflicting, non-related messages are received at members of the same replica group (e.g., two unrelated electronic mail messages from different users) then they need not be ordered consistently at these destinations. If they were related in some manner (e.g., from the same user) then they could be ordered consistently. Application level ordering can be achieved more efficiently as cheaper, reliable broadcast protocols can be used to deliver messages for which ordering is unimportant, resorting to the more complex order preserving protocols only where necessary. Since such protocols typically need more rounds of messages to be transmitted, the reduction in their use can be beneficial to the system as a whole, whilst maintaining overall replica consistency.

One method of achieving such application level ordering would be to transmit messages using unordered atomic multicasts (since it is still important that the messages are received by all functioning replicas) which only guarantee delivery to all functioning replicas but make no guarantee of the order (described in a previous entry), and then to impose ordering on top of this i.e., at a level above the communication layer. If atomic actions are used in the system then we can make use of their properties to impose the ordering on message execution that we require i.e., the order shall be equivalent to the serialization order imposed by atomic actions. This has the advantage that operations from different clients which do not conflict (e.g., multiple read operations) can be executed in a different order at each replica. Atomic actions will ensure that multiple accesses from different clients to the same resource will be allowed only if such interaction is serialisable.

Tuesday, September 22, 2015

Heisenberg's back!

A long time ago (longer than I care to remember), I made the analogy between Heisenberg's Uncertainty Principle and large-scale data consistency (weak/eventual consistency). It got reported by InfoQ too. Over the weekend I came across a paper from friend/colleague Pat Helland where he made a similar analogy, so I figured I'd mention it here. What's that they say about "great minds" ;) ?

Saturday, May 16, 2015

A little more on transactions and microservices

I had some interesting discussions this week which made me want to write something more on transactions (really XA) and microservices.

Thursday, May 07, 2015

Slightly frustrating article ...

I came across an article the other day from last year which tried to poke holes in transactions. I kept putting off writing about it but eventually decided I had to say something. So ... here it is.

Saturday, April 11, 2015

Transactions and microservices

Sometimes I think I have too many blogs to cover with articles. However, this time what I was writing about really did belong more in the JBoss Transactions team blog, so if you're interested in my thoughts on where transactions (ACID, compensation etc.) fit into the world of microservices, then check it out.

Monday, April 06, 2015

Microservices and state

In a previous entry I was talking about the natural unit of failure for a microservice(s) being the container (Docker or some other implementation). I touched briefly on state but only to say that we should assume stateless instances for the purpose of that article and we'd come back to state later. Well it's later and I've a few thoughts on the topic. First it's worth noting that if the container is the unit of failure, such that all servers within an image fail together, then it can also be the unit of replication. Again let's ignore durable state for now but it still makes sense to spin up multiple instances of the same image to handle load and provide fault tolerance (increased availability). In fact this is how cluster technologies such as Kubernetes work.

I've spent a lot of time over the years working in the areas of fault tolerance, replication and transactions; it doesn't matter whether we're talking about replicating objects, services, or containers, the principles are the same. This got me to thinking that something I wrote over 22 years ago might have some applicability today. Back then we were looking at distributed objects and strongly consistent replication using transactions. Work on weakly consistent replication protocols was in its infancy and despite the fact that today everyone seems to be offering them in one form or another and trying to use them within their applications, their general applicability is not as wide as you might believe; and pushing the problem of resolving replica inconsistencies up to the application developer isn't the best thing to do! However, once again this is getting off topic a bit and perhaps something else I'll come back to in a different article. For now let's assume if there are replicas then their states will be in sync (that doesn't require transactions, but they're a good approach).

In order to support a wide range of replication strategies, ranging from primary-copy (passive) through to available copies (active), we created a system whereby the object methods (code) was replicated separately from the object state. In this way the binaries representing the code was immutable and when activated they'd read the state from elsewhere; in fact because the methods could be replicated to different degress from the state, it was possible for multiple replicated methods (servers) to read their state from an unreplicated object state (store). I won't spoil the plot too much so the interested reader can take a look at the original material. However, I will add that there was a mechanism for naming and locating these various server and state replicas; we also investigated how you could place the replicas (dynamically) on various machines to obtain a desired level of availability, since availability isn't necessarily proportional to the number of replicas you've got.

If you're familiar with Kubernetes (other clustering implementations are possible) then hopefully this sounds familiar. There's a strong equivalency between the components and approaches that Kubernetes uses and what we had in 1993; of course other groups and systems are also similar and for good reasons - there are some fundamental requirements that must be met. Let's get back to how this all fits in with microservices, if that wasn't already obvious. As before, I'll talk about Kubernetes but if you're looking at some other implementation it should be possible to do a mental substitution.

Kubernetes assumes that the images it can spin up as replicas are immutable and identical, so it can pull an instance from any repository and place it on any node (machine) without having to worry about inconsistencies between replicas. Docker doesn't prevent you making changes to the state within a specific image but this results in a different image instance. Therefore, if your microservice(s) within an image maintain their state locally (within the image), you would have to ensure that this new image instance was replicated in the repositories that something like Kubernetes has access to when it creates the clusters of your microservices. That's not an impossible task, of course, but it does present some challenges, including how to distribute the updated image amongst the repositories in a timely manner - you wouldn't want a strongly consistent cluster to be created with different versions of the image because that means different states and hence not consistent, and how to ensure that state changes that happen at each Docker instance and result in a new image being created are in lock-step - one of the downsides of active replication is that it assumes determinism for the replica, i.e., given the same start state and the same set of messages in the same order, the same end state will result; not always possible if you have non-deterministic elements in your code, such as the time of day. There are a number of ways in which you can ensure consistency of state, but we're not just talking about the state of your service now, it's also got to include the entire Docker image.

Therefore, overall it can be a lot simpler to factor the binary that implements your algorithms for your microservices (aka the Docker image or 1993 object) from the state and consider the images within which the microservices reside to be immutable; any state changes that do occur must be saved (made durable) "off image" or be lost when the image is passivated, which could be fine for your services of course, if there's no need for state durability. Of course if you're using active replication then you still have to worry about determinism, but we're only considering state here and not the actual entire Docker image binary too. The way in which the states are kept consistent is covered by a range of protocols, which are well documented in the literature. Where the state is actually saved (the state store, object store, or whatever you want to call it) will depend upon your requirements for the microservice. There are the usual suspects, such as RDBMS, file system, NoSQL store, or even highly available (replicated) in memory data stores which have no persistent backup and rely upon the slim chance that a catastrophic failure will occur to wipe out all of the replicas (even persistent stores have a finite probability that they'll fail and you will lose your data). And of course the RDBMS, file system etc. should be replicated or you'll be putting all of your eggs in the same basket!

One final note (for now): so far we've been making the implicit assumption that each Docker image that contains your microservices in a cluster is identical and immutable. What if we relaxed the identical aspect slightly and allowed different implementations of the same service, written by different teams and potentially in different languages? Of course for simplicitly we should assume that these implementations can all read and write the same state (though even that limitation could be relaxed with sufficient thought). Each microservice in an image could be performing the same task, written against the same algorithm, but with the hopes that bugs or inaccuracies produced by one team were not replicated by others, i.e., this is n-versioning programming. Because these Docker images contain microservices that can deal with each other's states, all we have to do is ensure that Kubernetes (for instance) spins up sufficient versions of these heterogeneous images to give us a desired level of availability in the event of coding errors. That shouldn't be too hard to do since it's something research groups were doing back in the 1990's.

Monday, May 26, 2014

Microservices and transactions - turning back the clock?!

A cross-post of an entry I wrote for the JBossTS/Narayana blog. Microservices and transactions, oh my!

Tuesday, October 01, 2013

HPTS 2013

Just back from a JCP-EC meeting, JavaOne and HPTS. Whilst I enjoyed them all, HPTS has to be my favourite. Unfortunately this year its schedule conflicted with JavaOne so I wasn't able to attend either event fully. But even just the 3 days that I was at HPTS were well worth the trip: it's a great workshop where you get the chance to meet people from all areas of our industry and talk without fear of confidentiality. "What's said at HPTS stays at HPTS".

I had the privilege of presenting again this year, on the topic of transactions, NoSQL and Big Data. I was also chairing a session immediately afterwards on a range of topics including hardware transactional memory. Overall the sessions are great, but it's the dinner and drink discussions that are the real value around the workshop. And it's a great chance to catch up with friends I tend to only see once every two years!

Friday, May 03, 2013

Banks and ACID transactions

For the millionth time, I hate it when people exaggerate. (Yes, that's meant as humour!) But seriously, when you have a single point on a graph, you can make any line or curve you want fit it! In order to make true judgements, we need all of the facts. And as a scientist, I'm always on the lookout for experiments, experience, facts etc. that shoot holes in existing theories, since that's how science advances.

Now why do I say all of the above? Because we have a prime example of people jumping to the wrong conclusion based on incomplete facts. Once again it's around ACID transactions, because apparently banks don't use them! I'm sure that'll come as a surprise to many of my banking friends and colleagues, but it has to be true because they're not used in ATMs (in some countries). Based on this one fact (which I'm sure it correct in some circumstances/locales), it seems that all bank transactions are BASE and not ACID. I've discussed ACID, BASE and CAP before, so I won't go into details. However, it seems that this meme has been picked up by a lot of people:

Therefore, I just wanted to make sure everyone understood that there's a lot more to banking than ATM machines. ACID transactions are a backbone of a lot of what goes on throughout the financial services sector. Of course there are areas where ACID transactions aren't needed and shouldn't be used. I've said as much myself over the years. With all of the work we've been involved with around extended (non-ACID) transactions, that should be obvious.

So I do not have an issue with someone suggesting that ACID is not suitable for a specific use case. Where I do have a problem is when people jump to the conclusion that just because ACID isn't right for one use case it seems to mean it's not right for all use cases. That's like saying that atoms don't exist because you can't see them with the naked eye!

Sunday, March 31, 2013

A Raspberry Pi and vert.x related cross-posting

I've been making progress on another Pi-related project. Since it also involves transactions, I posted it on the JBossTS blog, but wanted to cross-post here for those who may not track that blog separately.

Sunday, March 17, 2013

Travelling is a PITA at times!

I spent pretty much all of the week before last in New York and Boston visiting many of our financial services customers. It's a great opportunity to hear what they like, what they don't like and work together to make things better. I'd do this much more often though if it didn't involve flying! It seems that every time I fly I end up getting ill; usually a cold or (man) flu. Unfortunately this time was no different and when I got home at the weekend I could feel something coming. Sure enough, it was a heavy cold and it laid me up for days (I'm still not recovered completely yet).

Then while I'm recovering I remember that I missed QCon London. I vaguely remember many months ago while planning that my trip conflicted with QCon, but it had slipped from my mind until last week. It's a shame, because I love the QCon events. However, what made this one worse was that I appear to have completely missed the fact that Barbara Liskov was presenting! It's been the best part of 20 years since I last saw Barbara, so it would have been good to hear her and try to catch up. Back in the 1980's I visited her group for a while due to the similarities between what they were doing around Argus, replication (both strong and gossip based) and transactions, and of course what we were doing in Arjuna. She was a great host and I remember that visit very fondly. Oh well, maybe in another 20 years we'll get a chance to meet up again!

Thursday, February 07, 2013

HPTS 2013 CfP

15th International Workshop on High Performance Transaction Systems (HPTS)
September 22-25, 2013
Asilomar Conference Grounds, Pacific Grove, CA
http://hpts.ws

Every two years, HPTS brings together a lively and opinionated group of participants to discuss and debate the pressing topics that affect today's systems and their design and implementation, especially where scalability is concerned. The workshop includes position paper presentations, panels, moderated discussions, and significant time for casual interaction. And of course beer.

Since its inception in 1985, HPTS has always been about large scale. Over the years the focus has shifted from scalable transaction processing to very large databases to cloud computing. Today, scalability is about big data. What interesting but out-of-the-spotlight big-data applications are out there? How are datacenter software and hardware abstractions evolving to support big data apps? How has big data changed the role of data stewardship‹not just data security, but data provenance and dealing with noisy data? How are big data apps affected by limitations in energy consumption? What advances have occurred in identifying patterns and even approximate schemas at petabyte scale? How have the provisioning of networking, storage and computing in datacenters had to shift to support these apps?

We ask potential participants to submit a brief technical summary or position, presenting a viewpoint on a controversial topic, a summary of lessons learned, experience with a large or unusual system, an innovative mechanism, an enormous problem looming on the horizon, or anything else that convinces the program committee that the participant has something interesting to say. The submission process is purposely lightweight, but we require each submission to have only a single author.

The workshop is by invitation only and is limited to under 100 participants. The submissions drive both the invitation process and the workshop agenda. Participants may be asked to be part of a presentation or discussion session at the workshop. Students are particularly encouraged to submit.

What to submit:
A 1 page position statement or extended abstract
Optional: the written submission can include a link to one or both of the following as an expanded part of the submission:
Maximum of 3 PowerPoint-type slides
Maximum 2 minute video presentation ‹can be of you speaking with or without slides, a video demo or other video illustration of your proposed presentation, etc.
Even if you choose NOT to submit these, the PC may decide to ask you for them later during consideration of submissions.
The length limits will be strictly observed. We won't consider too-long submissions.

How to submit: Go to http://bit.ly/hpts2013submit

When to submit: Now would be good. Official deadlines are:
Submission of Papers: March 11, 2013
Notification of Acceptance: May 24, 2013
HPTS Workshop: September 22-25, 2013

Organizing committee: Pat Helland, Salesforce; Pat Selinger, IBM (General Chair); Shel Finkelstein, SAP; Mark Little, Red Hat

Program committee
Anastasia Ailamaki, EPFL
David Cheriton, Arista Networks/Stanford
Adrian Cockcroft, Netflix
Bill Coughran, Sequoia Capital
Armando Fox, UC Berkeley (Chair)
Sergey Melnik, Google
Adam Messinger, Twitter
Margo Seltzer, Harvard
Wang-Chiew Tan, UC Santa Cruz

Poster session chair
Michael Armbrust, Google

Mark Little's WebLog