Friday, June 24, 2005

SOAP: slow or fast?

There's an interesting discussion going on here between Michi and Jim about the performance of SOAP. I wasn't going to get involved in what could easily become a "PC verus Mac, Unix versus Windows" debate, but I'll add my 2 cents worth.

I agree with them both, to a point.

I'm sure ICE is fast (I too haven't used it). I do know that CORBA implementations these days are very fast too and message service implementations (for example) like AMS are built for speed. When I started out doing my PhD back in the mid 1980's, my first task was to help write and improve Rajdoot, one of the first RPC mechanisms around. Even then, when a fast network was 10 mbps (if you were lucky) and we used Whitechapels or Sun 360's, we could regularly get round trips on 5ms for messages up to 1K in size (packet fragmentation/re-assembly happens above this, so this was the maximum critical packet size). Not fast by today's standards, but fast back then. Having been working with SOAP and Web Services for 5 years now, I know it's slow compared to what we had in 1986, so it simply doesn't compare to what's possible these days. So, I agree with Michi's point that on that (yes, we have tried compression over the years too, and got the same results as Michi - it works, but you've got to use it carefully.)

However, and this is where I also agree with Jim, SOAP performance can be improved. The sorts of things that go on under the covers today in terms of XML parsing, for example, are pretty inefficient. Next time you want to see, just pop up something like OptimizeIt and watch what happens. I'm pretty confident that developers can and will improve on this. As an analogy, back when IONA released the first version of Orbix it was the market leader but its performance was terrible compared to later revisions. (Opcodes were shipped as strings, for a start!) I'm not singling out IONA - this is a pattern that many other ORB providers followed. So, I agree with Jim: SOAP doesn't have to be this slow - it can be improved.

But this is where I stop agreeing and come back to the fact that it's beginning to sound like the "PC verus Mac, Unix versus Windows" debates of old. You're not comparing like with like.

This is definitely a case of using the right tool for the right job, combined with some unfortunate commercial realities. If you want interoperablity with other vendors (eventually pretty much any other vendor on the planet), then you'd go the SOAP route: there is no logical argument to the contrary. CORBA didn't get mass adoption, DCE failed before it, and despite Microsoft's power, so did DCOM. Eric has some interesting things to say on the subject here, but the reason SOAP works well is because of XML, HTTP (IMO) and pretty much universal adoption. I can't see that changing. In the forseeable future, I can't see the likes of Microsoft, IBM, Oracle, BEA etc. agreeing on a single protocol and infrastructure as they have with SOAP. To be honest, I think they were forced into the current situation because of the mass take-up of the original Web: they like vendor lock-in and had managed to maintain it for decades prior to Tim's arrival on the scene.

But you pay a heavy price for this kind of interoperability. There are inherent performance problems in SOAP that I just can't see going away. We may be able to chip at the surface and perhaps even make bit dents, but fundamentally I'm confident that SOAP performance versus something like ICE (or CORBA) will always be a one-sided contest. However, a contest of interoperability will be just as one-sided, with SOAP winning. From the moment I got into Web Services, I've said that I can't see it (and SOAP) replacing distributed environments like CORBA everywhere. It frustrates me at times when I see clients trying to do just that though and then complaining that the results aren't fast enough! If I want to go off-road, I'll buy a Land Rover; but if I want speed, give me a Ferrari any day! Distributed systems such as CORBA have been heavily optimised over the years and use binary encodings as much as possible - with the resultant impact on interoperability and performance. But that is fine. That's what they're intended for. Certainly if I was interested in high performance, I wouldn't be looking at SOAP or Web Services, but at CORBA (or something similar).

So in summary: of course there will be performance improvements for the SOAP infrastructure. There may even be a slow evolution to a pure binary, extremely efficient distributed invocation mechanism that looks similar to those systems that have gone before. But it's not strictly necessary and I don't see it happening as a priority. Use SOAP for interoperability. It lowers the integration barrier. But if you are really interested in performance and/or can impose a single solution on your corporate infrastructure, you may be better off looking elsewhere, to something like CORBA, or maybe even ICE.


Michi Henning said...
This comment has been removed by a blog administrator.
Michi Henning said...

Hi Mark, long time no talk ;-)

Thanks for your thoughful comments. I agree with much that you say. But there is one point I keep tripping over:

But you pay a heavy price for this kind of interoperability.

People repeat this like a mantra over and over. There seems to be some assumption that, in order to be nteroperable, a protocol can't use binary. This is a completely incorrect statement, and there is plenty of precedent to prove that statement incorrect. Just think of the whole internet suite of protocols (many of which are binary) that interoperate just fine (not mention TCP/IP itself). Just because CORBA has interoperability problems does not mean that binary protocols have interoperability problems. CORBA's interoperability problems stem from poor protocol design, unnecessary complexity, and sheer incompetence. If you look at the Ice protocol, you'll find that it does everything CORBA does, yet in ways that are very simple (and, as a side-effect of the simplicity, are efficient as well). Having a third party interoperate with the Ice protocol is trivial (and has been implemented without difficulty in the past).

If people want to ship XML documents around and build document-centric applications, by all means, let them do that. But why did we have to standardize on a transport infrastructure that uses technology singularly ill-suited to the job? Let's face it: the choice of XML as an encoding has historical and political reasons. The XML craze of the late nineties and early twothousands was responsible for that, not any technical consideration. Basically, if it was XML, it sold, regardless of whether XML was suitable technology or not. So, we end up getting stuck with this idiotic transport, and then people come up with bandaids in an attempt to make this poor choice more palatable and, in the process, without even realizing it, make things even worse. We could have just as easily standardized on an efficient binary transport and saved ourselves all the grief. (But, of course, at the time, that would have been a much harder sale.)

So, we repeat history, and the world yet again gets stuck with a business solution and a standard that rest on a technical foundation of sand. And, as always, reality will eventually catch up with us, and we'll throw the entire SOAP/XML nonsense away in favor of something better. But, this industry being what it is, we won't do that until after we've spent billions of dollars. And then, we'll have a new generation of people who, like their predecessors, will ignore the lessons of the past and reinvent a wheel with lots of corners, and the cycle will repeat itself. ("I've never designed a distributed computing infrastructure before but, heck, how hard can it possibly be?")

Distributed objects are a good idea. CORBA was one attempt at building an infrastructure for that. CORBA failed to become a ubiquitous infrastructure because of technical deficiencies, vendor bickering (MS versus the rest of the world), vendor greed (IBM asking $100,000 at one point for a CORBA license), and unnecessary complexity. What is the industry's reaction? It goes and concludes that distributed objects are bad, invents a completely new thing that is inferior at both the conceptual and technology levels, and throws out the baby with the bath water. And then proudly writes papers about its achievements, such as 7ms latency...

We really could have the best of both worlds, XML and binary protocols, if we only bothered to stop and think about it, and were willing to learn from the lessons of the past. But I'm not holding my breath...



Mark Little said...

Hi Michi - yes, it's been a while, but as always, it's good to talk to you.

So, I couldn't agree with you more: interoperability doesn't mean XML and it could have been done using binary (for example, there was a push for ASN.1 many years back that, if successful, could have made a big difference today).

When I said: But you pay a heavy price for this kind of interoperability The important word in that sentence is this, which really means: SOAP over HTTP and universally adopted by everyone. If we'd been successful in persuading everyone to adopt CORBA (and I think we came pretty close), then I still believe things today would be different. But we didn't and as an industry we have to suffer the consequences. With any luck, given quite a few years, the binary evolution of SOAP that I mentioned, might actually result in something more akin to CORBA or DCOM - and yes, it's frustrating that people keep forgetting history ;-)

BTW, I think that SOAP is over used in many situations because people see that they want to ship XML around and immediately consider it as the best way of doing it. Hey, if all you want to do is exchange XML as an application-level exchange, then do it at the application and use something like CORBA, RMI, or ICE and put the overhead where it belongs: at the application.

So hopefully we're not in disagreement. I definitely believe that binary interoperability is possible. But I don't think it's going to happen this year, next year or in the forseeable future. That's not because of technical drawbacks, but just because of how long it took to get to this point with SOAP (and probably how certain vendors were pushed into it in the first place). Because we are using SOAP, we as an industry have to pay a price. But maybe that price isn't such a bad thing given what it gets us?