Mark Little's WebLog: June 2012

Monday, June 18, 2012

Worried about Big Data

I've been spending quite a lot of time thinking about Big Data over the past year or two and I'm seeing a worrying trend. I understand the arguments made against traditional databases and I won't reiterate them here. Suffice it to say that I understand the issues behind transactions, persistence, scalability etc. I know all about ACID, BASE and CAP. I've spent over two decades looking at extended transactions, weak consistency, replication etc. So I'm pretty sure I can say that I understand the problems with large scale data (size and physical locality). I know that one size doesn't fit all, having spent years arguing that point.

As an industry, we've been working with big data for years. A bit like time, it's all relative. Ten years ago, a terabyte would've been considered big. Ten years before that it was a handful of gigabytes. At each point over the years we've struggled with existing data solutions and made compromises or rearchitected them. New approaches, such as weak consistency were developed. Large scale replication protocols, once the domain of research, became the industrial reality.

However, throughout this period there were constants in terms of transactions, fault tolerance and reliability. For example, whatever you can say against a traditional database, if it's been around for long enough then it'll represent one of the most reliable and performant bits of software you'll use. Put your data in one and it'll remain consistent across failures and concurrent access with a high degree of probability. And several implementations can cope with several terabytes of informations.

We often take these things for granted and forget that they are central to the way in which our systems work (ok you could argue chicken-and-egg). They make it extremely simple to develop complex applications. They typically optimise for the failure case, though, adding some overhead to enable recovery. There are approaches which optimise for the failure free environment, but they impose and overhead on the user who typically has a lot more work to do in the hopefully rare case of failures.

So what's this trend I mentioned at the start around big data? Well it's the fact that some of the more popular implementations haven't even thought about fault tolerance, let alone transactions of whatever flavour. Yes they can have screaming fast performance, but what happens when there's a crash or something goes wrong? Of course transactions, for example, aren't the solution to every problem, but if you understand what they're trying to achieve then at some point somewhere in your big data solution you'd better have an answer. And "roll your own" or "DIY" isn't sufficient.

This lack of automatic or assistive fault tolerance is worrying. I've seen it before in other areas of our industry or research and it rarely ends well! And the argument about it not being possible to provide consistency (whatever flavour) and fault tolerance at the same time as performance doesn't really cut it in my book. As a developer I'd rather trade a bit of performance, especially these days when cores, network, memory and disk speed are all increasing. And again, these are all things we learnt through 40 years of maintaining data in various storage implementations, albeit mostly SQL in recent times. I really hope we don't ignore this experience in the rush towards the next evolution.

Sunday, June 17, 2012

Software engineering and passion

I was speaking with some 16 year old students from my old school recently and one of them told me that he wanted to go to university to become a software engineer. He's acing all of his exams, especially maths and sciences as well as those topics that aren't really of interest. So definitely a good candidate. However, when I asked what he had done in the area of computing so far, particularly programming, the answer was nothing.

This got me thinking. By the time I was his age, I'd been programming for almost four years, written games, a basic word processor and even a login password grabbing "utility". And that's not even touching on the electronics work I'd done. Now you could argue that teaching today is very different than it was 30 years go, but very little of what I did was under the direction of a teacher. Much of it was extra curricula and I did it because I loved it and was passionate enough to make time for it.

Now maybe I've been lucky, but when thinking about all of the people I've worked with over the years and work with today, I'd say that they all share that passion for software engineering. Whether they've only been in the industry for a few years or for several decades, the passion is there for all to see. Therefore, I wonder if this student had what it takes to be a good engineer. But as I said, maybe I'm just lucky in the people with whom I've been able to work, as I'm sure there are those software engineers for whom it really is just a day job and they are still good at that job. But I'd still hate to not have the passion and enthusiasm for this work!

Sunday, June 10, 2012

When did we stop remembering?

Over the past year or so I've been reading articles and papers, or watching recorded presentations, on fault tolerance and distributed systems, produced over the last couple of years. And whilst some of it has been good, a common theme throughout has been the lack of reflection on the large body of work that has been done in this area for the past four decades or more! I've mentioned this issue in the past and had hoped that it was a passing trend. Unfortunately I just finished watching a video from someone earlier this year at the "cutting edge" of this space who described all of the issues with distributed systems, fault tolerance and reliability; not once did he mention Lamport's Time, Clocks and Ordering of Events in a Distributed System (yet he discussed the same issues as if they were "new"), failure suspectors, the work of Gray, Bernstein and others. The list goes on! If this had been a presentation in the 1970's or 80's then it would have been OK. But in the 2nd decade of the 21st century, where most work in the software arena has been digitised and is searchable, there is no excuse!