Monday, December 13, 2004

Web Services transactions, entropy, heuristics and the information society

Imagine you walk into a bank and want to perform a transaction (banks are very useful things in transaction examples). That transaction involves you transferring money from one account (savings) to another (current). You obviously want this to happen with some kind of guarantee, so for the sake of this example let's assume we use an ACID transaction.

Now there's no such thing as a guarantee where physical media are concerned. The second law of thermodynamics states that entropy always increases and entropy is related to the level of chaos/disorder in the universe. Put simply, a less entropic system is more ordered and a more entropic system is more chaotic. I won't go into what the definitions of "order" and "chaos" are here, but another way of looking at this is to consider what happens when you buy an apple (the fruit, not the hardware!): it's fairly "ordered" in that the molecules that go to make it up are pretty much all "apple". However, if you leave it in the fruit bowl for too long it goes wrinkly and fuzzy with mould and eventually starts to decay entirely. (Kind of reminds me of some of the "experiments" we used to do in my undergraduate days to see how long unwashed plates would take to mould-over - though looking back I think they were really excuses for not washing up and nothing to do with physics experiments!)

Anyway, back to the apple. Over time, the molecules break down from the action of light, natural chemical reactions etc. The molecules form a host of other molecules and become less ordered, i.e., more entropy enters the system.

This is a very long winded way of saying that everything decays eventually. The same thing that happens to the apple happens to physical media. And statistics/probabilities say that even a new hard disk can fail on the first use.

So, in our bank example, despite the fact that we're using transactions and assuming that the transaction system is reliable, certain failures will always occur, given enough time and probabilities. The kinds of failure were interested in for this example are those that occur after the participants in the two-phase commit transaction have said they will do the work requested of them (transfer the money)i.e., during the second (commit) phase. So, the money has been moved out of the savings account (it's really gone) and is being added to the current account, when the disk hosting the
current account dies. Usually what this means is that we have a non-atomic outcome, or a heuristic outcome: the transaction coordinator has said commit, one participant (savings account) has said DONE, but the second one (current account) has said OOPS. There's no going back with the work the savings participant has done, so this transaction isn't going to be atomic (all or nothing).

Most enterprise transaction specifications and implementations allow for this via a heuristic error. This basically means that the transaction system can be informed (and hence can inform) that such an error has happened. There's not a lot that can be done automatically to fix these types of error. They often require semantic information about the application in order to restore consistency, so have to be handled by a system administrator. However, the important thing is that someone knows there's been a problem.

Imagine that this error happens and you don't know about it! Or at least don't know about it until the next time you check your account. Not good. Personally I'd like to know if there's been a screw-up as soon as possible. In our bank scenario, I can go and talk to someone in the branch. If I was doing this via the internet there's usually a number I can call to talk to someone (probably located in a different country these days ;-)

Now why is this important? Well, there are a few Web Services transactions specifications around that can be used in this scenario. BTP, WS-Atomic Transaction and WS-ACID Transaction. The first and last both allow for heuristic-like errors to be sent from participant to coordinator and from coordinator to end-user, whereas the second one (from IBM, Microsoft and BEA) doesn't. This seems like a strange omission, because errors do happen.

OK, it's not as bad as might first seem. Of course I can use WS-Atomic Transaction to communicate these errors. Unfortunately I just can't do it within the specification. I'd have to overload SOAP faults (for example), or maybe use some proprietary extension (repeat after me: vendor lock-in is not good). Not exactly good for interoperability and/or portability. The fact that protocols like WS-Atomic Transaction and WS-ACID Transaction are really meant for interoperability of existing transaction service implementations (e.g., Tuxedo-to-CICS, or ATS-to-Encina), where heuristics originated, makes this omission even more striking.

Oh well. Maybe failures don't happen. The 2nd law of thermodynamics does fall down if time flows backwards ;-)

No comments: