Friday, October 26, 2012

NoSQL and transactions

I've been thinking about ACID and non-ACID transactions for a number of years. I've spent almost as long working in the industry and standards trying to evolve them to cater for environments where strict ACID transactions are too much. Throughout all of this I've been convinced that transactions are the right abstraction for many of the fault tolerance, reliability and consistency requirements. Over the years transactions have received bad press in some quarters, sometimes from people who don't understand them, over use them, or don't really want to have to implement them. At times various waves of technology have either helped or hindered the adoption of transactions outside of the traditional database; for instance some NoSQL efforts eschew transactions entirely (ACID and extended) citing CAP when it's not always right to do so.

I think a good transactions implementation should be at the core of all middleware platforms and databases, because if it's well thought out then it won't add overhead when it's not needed and yet provides obvious benefits when it is. It should be able to offer a wide range of transaction models (well at least more than one) and a model that makes it easier to reason about the correctness and consistency of applications and services developed with it.

At the moment most NoSQL or BigData solutions either ignore transactions or support ACID or limited ACID (only in the scope of a single instance). But it's nice to see a change occurring, such as seen with Google's Spanner work. And as they say in the paper: "We believe it  is better to have application programmers deal with performance problems due to over use of transactions as bottlenecks arise, rather than always coding around the lack of transactions."

And whilst I agree with my long time friend, colleague and co-author on RDBMS versus the efficacy of new approaches, I don't think transactions are to be confined to the history books or traditional back-end data stores. There's more research and development that needs to happen, but transactions (ACID and extended) should form a core component within this new infrastructure. Preconceived notions based on overuse or misunderstanding of transactions shouldn't disuade their use in the future if it really makes sense - which I obviously think it does.


Stephen Pimentel said...

With Spanner and friends, it certainly does seem like there's a growing recognition of the benefits of transactions in the context of NoSQL systems. In some ways, working with a distributed, scalable system makes transactions more valuable, not less. Having ACID guarantees is usually really important, as well, although there are also use cases where one may wants to relax them.

The problem has been that too few NoSQL systems have done the hard engineering work of making transactions performant in a distributed context. Several systems claim transactional semantics, but when you read the fine print, it's only for a limited set of pre-defined, local operations, not application-defined transactions. Application developers need something better than this.

Stephen Pimentel

Manik Surtani said...

Agreed. I've encountered many people who have grown to expect less-than-ACID semantics from NoSQL, and are pleasantly surprised when I talk about Infinispan being transactional. However, I've also had the reaction where people expect Infinispan to be eventually consistent in the wake of network partitions - two things which are mutually exclusive. While Infinispan will have both strongly consistent as well as eventually consistent operation modes down the road, it is also important to note that we provide the ability for application developers to help their transactions perform well - features such as grouping of entries, explicitly controlling colocation, and deadlock detection - as well as new transaction protocols - total order, and the ability to dynamically switch between transaction protocols.

- Manik