I work for Red Hat, where I lead JBoss technical direction and research/development. Prior to this I was SOA Technical Development Manager and Director of Standards. I was Chief Architect and co-founder at Arjuna Technologies, an HP spin-off (where I was a Distinguished Engineer). I've been working in the area of reliable distributed systems since the mid-80's. My PhD was on fault-tolerant distributed systems, replication and transactions. I'm also a Professor at Newcastle University and Lyon.
Friday, April 27, 2012
Java Forum at the Titanic Centre
Java or the JVM
Friday, April 06, 2012
Transactions and parallelism and actors, oh my!
In just 4 years time I'll have spent 3 decades researching and developing transactional systems. I've written enough about this over the years to not want to dive in to it again, but suffice to say that I've had the pleasure of investigating a lot of uses for transactions and their variations. Over the years we've looked at how transactions are a great building block for fault tolerant distributed systems, most notably through Arjuna which with the benefit of hindsight was visionary in a number of ways. A decade ago using transactions outside of the database as a structuring mechanism was more research than anything else, as was using them in massively parallel systems (multi-processor machines were rare).
However, today things have changed. As I've said several times before, computing environments today are inherently multi-core, with true threading and concurrency, with all that that entails. Unfortunately our programming languages, frameworks and teaching methods have not necessarily kept pace with these changes, often resulting in applications and systems that are inherently unreliable or brittle in the presence of concurrent access and worse still, unable to recover from the resultant failures that may occur.
Now of course you can replicate services to increase their availability in the event of a failure. Maybe use N-version programming to reduce or remove the chances that a bug in one approach impacts all of the replicas. But whereas strongly consistent replication is relatively easy to understand, it has limitations which have resulted in weak consistency protocols that trade off things like performance and ease of use for application level consistency (e.g., your application may now need to be aware that data is stale.) This is why transactions, either by themselves on in conjunction with replication, have been and continue to be a good tool in the arsenal of architects.
We have seen transactions used in other frameworks and approaches, such as the actor model and software transactional memory, sometimes trading off one or more of the traditional ACID properties. But whichever approach is taken, the underlying fundamental reason for using transactions remains: they are a useful, straightforward and simple mechanism for creating fault tolerant services and individual objects that work well for arbitrary degrees of parallelism. They're not just useful for manipulating data in a database and neither are they to be considered purely the domain of distributed systems. Of course there are areas where transactions would be overkill or where some implementations might be too much of an overhead. But we have moved into an era where transaction implementations are lighter weight and more flexible than they needed to be in the past. So considering them from the outset of an application's development is no longer something that should be eschewed.
Back to the Dark Ages
The other day, due to severe snowstorms (for the UK), we ended up snowed in and without power or heating for days. During this time I discovered a few things. For a start, having gas central heating that is initiated with an electric started is a major flaw in the design! Next, laptop batteries really don't last long. And a 3G phone without access to 3G (even the phone masts were without power!) is a great brick.
But I think the most surprising thing for me was how ill prepared I was to deal with the lack of electricity. Now don't get me wrong - we've had power outages before, so had a good stock of candles, blankets, torches and batteries. But previous outages have been infrequent and lasted only a few hours, maybe up to a day. And fortunately they've tended to be in the summer months (not sure why). So going without wasn't too bad.
However, not this time and I've been trying to understand why. I think it's a combination of things. The duration for one, but also the fact that it happened during the week when I had a lot to do at work. Missing a few hours connectivity is OK because there are always things I can do (do better) when there are no interruptions from email or the phone. But extend that into days and it becomes an issue, especially when alternative solutions don't work, such as using my 3G phone for connectivity or to read backup emails.
What is interesting is that coincidentally we're doing a check of our processes for coping with catastrophic events. Now whilst I think that this power outage hardly counts as such an event, it does drive home that my own personal ability to cope is lacking. After spending a few hours thinking about this (I did have plenty of time, after all!) I'm sure there are things I can do better in the future, but probably the one place that remains beyond my control is lack of network (3G as a backup has shown itself to be limiting). I'm not sure I can justify a satellite link! So maybe I just take this as a weak link and hope it doesn't happen again. But we may be investing in a generator if this happens again.