Friday, December 07, 2007

Large-scale distributed transactions

I've been working with transactions for quite a while and in the area of large-scale (numbers of participants, physical distance) since the original work on the Additional Structuring Mechanisms for the OTS (aka Activity Service). However, it wasn't until Web Services transactions, BTP, WS-CAF and WS-TX that theory started to get put into practice. We first started to talk about relaxing the ACID properties back with the CORBA Activity Service, but it was with the initial submissions to BTP that things started to be made more explicit and directly relevant.

Within the specifications/standards and associated papers or presentations, we made statements along the lines that isolation should be a back-end issue for services or the transaction model (remembering that one-size does not fit all). The notions of global consistency and global atomicity were relaxed by all of the standards. For instance, sometimes it is necessary to commit some participants in a transaction and roll back others (similar to what nested transactions would give us). Likewise, globally consistent updates and a globally consistent view of the transaction outcome have to be relaxed as you scale up and out.

Now I didn't find this as much of a leap of faith as some others, but I think that's because when I was doing my PhD I spent a lot of time working with weak consistency replication protocols. There's always been a close relationship between transactions and replication. Traditional replica consistency protocols are strongly consistent: all of the replicas are kept identical and this is fine for closely coupled groups, but it doesn't scale. Therefore, weak consistency replication protocols evolved in the 1980's and 1990s, where the states of replicas are allowed to diverge, either forever or for a defined period of time (see gossip protocols for some background). You trade of consistency for performance and availability. For many kinds of applications, this works really well.

It turns out that the same is true for transactions: in fact, it's necessary in Web Services if you want to glue together disparate services and domains, some of which may not be using the same transaction implementation behind the service boundary. I still think the best specification to illustrate this relaxation of the various properties is WS-BusinessProcess, part of WS-TransactionManager (OASIS WS-CAF). Although Eric and I came up with the original concept, we were never able to sell it to our co-authors on WS-TX (so far). I think one of our failings was to not write enough papers, articles or blogs about the benefits it offered and the practicalities it fit. However, every time I explained it to people in the field it was such an easy sell for them to understand how it fit into the Web Services world so much better than other approaches. (The original idea behind WS-BP came from some of the RESTful transactions work we did in HP, where it was code-named the JFDI-transaction implementation.)

I still find it a pleasant surprise that although our co-authors from Microsoft on WS-TX didn't get the reasons behind WS-BP, other friends and colleagues such as Pat Helland started to write about the necessity to relax transactionality. I like Pat's use of relativity to explain some of the problems. However, when I had to come and talk about what we'd been doing in the world of transactions for the past decade I thought Heisenberg's Uncertainty Principle was perhaps slightly better: you can either know the state that all participants will have, but not when; or vice versa.

No comments: