Python Is Middleware
Tim Daneliuk (firstname.lastname@example.org)
(Draft originally posted on comp.lang.python, 31 July, 2001)
Copyright © 2001 TundraWare Inc. Permission to freely reproduce this material is hereby granted under the following conditions: 1) The material must be reproduced in its entirely without modification, editing, condensing, or any change. 2) No fee may be charged for the dissemination of this material. Commercial use such as publishing this material in a book or anthology is expressly forbidden. 3) Full attribution of the author and source of this material must be included in the reproduction.
There has been an ongoing discussion about Python, C, and many other languages' relevance as "systems" languages on comp.lang.python recently. This started as a brief (Ha!) response, and evolved into the Epistle below.
I would venture to guess that there is at least as much, and maybe more "systems" code in Forth as in C. Forth is the language of choice for a great many real-time and embedded systems like those used by NASA and the Aerospace/Military folks. It is compact, easily ported, very fast, and has a pretty interesting programming paradigm (something of a cross between a high-level macro assembler, an HP RPN calculator, and C itself). I'm no Forth expert by any means, but it is very widely used - it's just invisible, the way really good code should be ;) It even shows up in general systems programming. IIRC, the Mac firmware bootstrap loader is written in Forth (?) and I am certain that part of the FreeBSD bootloader is in Forth - I just looked at the code a while ago.
For general OS development, 'C' is certainly the dominant language of choice nowadays. But I would point out that the among the very biggest body of systems code, that has been around a very long time, is IBM's OS/390 (or whatever they call it these days - MVS by any other name), a big part of which is written in assembler.
As I've mentioned on comp.lang.python before, the biggest (size of data * work arrival rate * number of users...) TP systems in the world (Visa, Mastercard, American Airlines, United Airlines...) run on TPF, a special transaction OS which is pretty much all assembler. (And no, you couldn't get a C based OS to do what TPF does even if you did have a couple hundred million dollars to redo it, IMHO.)
But, for most (systems) things, 'C' is a fine choice. I've done everything from real-time kernel work to OS utilities in 'C'. But... times are a' changin'. Systems languages are great when you need fast, compact code. But they crater big time when you have really big development efforts (only Microsoft thinks an OS needs to be huge - on most scales, an OS is a medium-sized effort, compared, say to an ERP system, a transaction manager, or even something as mundane as Payroll and GL). Systems languages also tend to reek when you have lots of programmers involved, because the language makes it so easy for programmers to shoot each other in the foot (or, in some malevolent cases, The Back).
Before bandwidth and memory got cheap (256M = $39 today), we either had to throw mainframes at problems or write in efficient languages like 'C'. Well, we don't have to do it for most things any more. That's why I like the 'C'-Python combo. Python is still a little rough around the edges (as the many PEP discussions reveal), but it is a superb general purpose programming language, very well suited to the vast majority of applications and utility programming tasks. It has a way to go in efficiency - we really do need, and will soon see, native code compilers - but it makes the programmer efficient in ways I've never seen in over 20 years of doing this stuff. I've been leading technology teams for the better part of the last decade - this means I wrote very little code myself - and I'm amazed at how fast I can sling correct Python even with my rusty old middle-aged programming chops. (I forgot just how much FUN programming can and should be. For an example of how the Elderly program, go have a giggle at: http://www.tundraware.com/Software/hb ;)
Programming language debates have always been with us - you youngsters
missed doozys in the Assembler-FORTRAN and, later, the FORTRAN-COBOL wars.
But what motivates these wars never goes away:
How do we write more correct, maintainable, code faster? How do we minimize applications dependency on particular systems and infrastructure choices? How do we spend most of our time/energy/money on our applications logic and not the plumbing underneath? How do we make enough money so <gender/species of choice> will like us?
Cheap Energy was literally the fuel of the first Industrial Revolution. Cheap Software is the fuel of this Economic Revolution. For those of you keeping score and who still read instead of watching Jerry Springer, the current high tech market behavior is remarkably similar to the what took place in the auto manufacturing consolidations at the beginning of the last century. He/She who wins the Software Battle wins this economy.
The winning technology will not emerge victorious because of paradigmatic elegance, but because of a proven track record in meeting the criteria above By that measure, C++ is a clear loser. Perl does well, but not in the "maintainability" arena. Java, started out very well, but is starting to show its middle-aged spread because too much about using Java involves "optional" additions (J2EE, JMS, EJB, EIEIO...) which have poor implementation track records across vendors. Moreover, too much of the initiatives in the Java world start and end with the Web, even when the intended audience is behind the server. The really hard/interesting/commercially lucrative opportunities are not in/on the web. They are in the back rooms of multi-billion dollar corporations who need way more than a just a shiny new interface.
(For those of you Neo-Marxists in the audience who think that "Information Just Wants To Be FREE", I should mention that the survivability of any technology has always been primarily a function of commercial adoption, at least in the long run. How many people write in COBOL today? (Many!) How many program in Eiffel? Snobol? ML? Oberon? Haskell? 'Nuff said. Economic Reality trumps Bad Collectivist Theory every time. Thank-You, Adam Smith.)
This, BTW, is something that Microsoft seems to be finally getting, but only recently. .NET may be a "distributed web infrastructure" in drag, but make no mistake about it, .NET is Microsoft's Trojan Horse to get themselves entrenched in the back office of the biggest technology buyers in the commercial world. The only things holding them back at the moment are:
They have a really hard time admitting to themselves that the world was/is/will remain heterogeneous. For .NET to really win the day, it has to run equally well on Win32/*nix/MVS/AS400/Tandem... I remember talking to a couple of their Enterprise computing people almost 10 years ago who really got this, but were frustrated by the Top Brass' lack of vision. Had Gates and Balmer even just Rented A Clue on this issue in the 80s, Microsoft would own the world - and then Gates could have fired Janet Reno!)
Their OS core is so UI centric, it is doubtful that it can easily make the jump to being a serious back-room large-scale contender. Yes, Win2K is a quantum leap forward for them, but then, it's predecessors were such unredeemable garbage, a (big) step forward was long overdue. Moreover, there's more to this picture than a clean kernel. There are issues of systems management, interoperability, recovery from failure, and such where Win32 is just plain dreadful. (OK, your enterprise servers just Blue Screened. Every minute of downtime is $1M of lost income. Lesse now, that's about $5-$10M per reboot for a typical Win32 barf event.)
But... they have a Big Bag Of Money, really smart people working there, and an unrelenting focus on their future. (That's why the only way their competitors can stay in the game is to go whining to the DOJ about how "unfair" the real world is - And I'm a lifelong Unix weenie!)
This is the real reason to stay on top of Python. The vagaries of the "Mine is bigger than yours" battles between McNealy, Gates, and Ellison will be with us until they aren't. Python lives outside this Billionaire Battlezone because none of the warring factions own the technology and it plays nicely with them all. It is precisely because both Perl and Python have avoided choosing sides in these silly ego competitions that they have and will survive. Just watch, it's already happening. CIOs and CTOs are being asked to make significant decisions about their companies' technology future. Picking .NET means choosing C# (yeah, yeah, Microsoft pitches the language neutrality of the CLR, but where you you suppose they will innovate? In Perl, Python, or C#?) Picking Sun and most every other Unix, means picking Java.
But there's something even more profound here that has dawned on me as I've explored Python. Every serious CIO/CTO you'll meet (No, not the ones that don't shave yet and spent Everyone Else's Money in the last 5 years) will tell you that infrastructure is a necessary evil. It is applications logic that runs their world .
The problem is that applications live a really long time - 20 years is not unusual - but they have to change infrastructure like underwear (every two or three years ;)) This means that in the lifetime of one business application, there may be 5 or 6 major upheavals in operating systems, disk farms, communications technologies, coffee makers... This just kills big IT operations in costs. - Not the costs of buying all this crap, but the cost of keeping those old apps running across all these changes. And, no, rewriting the apps in not a realistic option. YOU, try convincing the CIO at Schwab she needs to rewrite their trading system because you have a "Really Cool New OS Upgrade."
To respond to this problem, large IT shops have increasingly turned to "Middleware" - a layer of code between the application and the infrastructure to "insulate" the apps from the actual syntax and semantics of the underlying system. Examples of this include (and these are all somewhat arguable), RPCs, JMS, and ODBC. Even the original 'sockets' implementation at Berkeley had this in mind - the code has provision (never well/completely implemented) for protocols other than AF_UNIX and AF_INET. In principle, we should have seen AF_SNA...
But, there's a rub. When you buy Middleware, you are (if you did your homework right) removing a large part of the dependency the apps have on networking, OS, and so forth. BUT, you are marrying the Middleware vendor until that application goes away. Middleware vendors know this, so they act like drug dealers: The first shot is cheap - thereafter, they'll get their pound of flesh when you must have the latest version of their product to run your General Ledger or Billing System on the newest FuzzWuzzy 101 supercomputer (that still takes 5 minutes to boot Windows 3000).
Back In The Day, Middleware was the only way to crack the dependence between apps and the underlying plumbing. The cost of machinery and network bandwidth prevented you from having a generic Middleware layer between every OS service and the application. You abstracted only those things (like networking, especially) that you expected would change a lot, because it was simply too computationally intensive (== $$$) to abstract everything in the system.
BUT, as I said before, things are changin'. Machine and network bandwidth
are really cheap. What we can now afford to do is abstract pretty much
all the OS services that modern applications need so that they never
actually directly touch the "plumbing".
This is, more-or-less, exactly what .NET is all about (and for that matter, J2EE). Microsoft (and Sun) finally figured out that the battle really isn't about operating systems or languages (those are the "Trojan Horse" to which I alluded above). The battle is about object and content interoperability. .NET will "fix" these problems for you, and all you have to do, is go down to the Crossroads in Seattle (or Silicon Gulch) and sell your soul to, um, ... Microsoft (or Sun), forever because their middleware will always be in your shorts, no matter how many times you want to change them.
But, techno-politics aside, the best place to abstract a system is In The Programming Language. Let interpreter, compiler, and library writers worry about the faucets, pipes, and fittings of a modern system, and present a unified model to applications writers. To do this, you have to have several things:
Both Sun and Microsoft will tell you that this is precisely what (Java/J2EE, C#/.NET) give you. BUT, that's cuz they want to get married to you - forever - with no possible future divorce.
A runtime environment that can handle this level of abstraction without killing performance or requiring ghastly amounts of computer to do the Usual Things Applications Do. Why? Because the first sell to The Boss is an economic one. If you have The Answer, but it needs 30 teraflops to run General Ledger - You lose, thanks for playing.
A small core language with a big, standardized library that does most of the Usual Things Applications Do. Why? Because a big language is hard to port and hard to optimize. Libraries need to be standardized so that Applications Do The Things They Do in mostly the same way across different OS, networking, and distribution infrastructures. You'll never have 100% system transparency, but you can get close.
Stability in the core language. Why? Because if CIOs hate infrastructure churn, they Really hate applications churn. Remember, they want to focus on their golf games, not whether their programming language of choice returns floats or floors in a division. (This has been a long-running debate in the Python world - I just couldn't resist ;)))
A meaningful version/feature control system within the language. Why? So applications can survive across language upgrades.
Now notice, ahem, cough,cough, Python does exactly these things:
But, Python has one very important commercial advantage. It's not owned by a vendor! So long as they don't mirror the vendor systems architecture too closely, CIOs can deploy applications which have an excellent chance of surviving Yet Another Infrastructure Upgrade.
The level of programming abstraction is just about perfect for large applications, and the machinery needed to run it is quite reasonable.
The core language is very small and (reasonably) stable.
The "Batteries Included" modules approach of Python cover a huge part of What Applications Programs Do and this gets richer release by release. This Standard Set Of Abstractions is the technical core of why Python is the ideal middleware.
Constructs like "from future..." give the programmer the ability to build armor into their programs in anticipation of language evolution. It has been mentioned on comp.lang.python, and I agree heartily, that a "requires ..." verb needs to be added to allow the programmer to stipulate what the minimum versions of Python language and libraries are needed to run properly.
Well, it's never really that simple. Even with something as powerful as Python there are plenty of issues that make the Applications- Infrastructure boundary forever problematic. There are also commercial concerns:
If applications are deployed (in ANY language) using a distribution topology and architecture which closely mirrors the underlying infrastructure's topology, migration to other infrastructures is Really Painful (DAMHIKT). For instance, if you write a Python application which depends on socket datagram broadcasting, and then have to accommodate a new (or old) network that does not support broadcast, umm, you have a problem. SEMANTICS MATTER.
Infrastructure vendors always offer stuff that is hard/impossible to do at higher layers of abstraction. it is inevitable that real systems will have to reach into the guts now and then to get things done. The issue here is whether the CIO and team are savvy enough to localize this sort of thing in places that can later be easily changed. STRUCTURE MATTERS.
The Pain And Suffering of infrastructure churn has caused more than one CIO to lose their job. Knowing this, some technology leaders resist doing those upgrades "On their watch." In this situation, applications are forced to cope with infrastructure deficiencies by coding around them. This leads to really ugly, hard-to-maintain systems. CHANGING UNDERWEAR MATTERS.
As a matter of living in the Real World, it is always preferable to Buy rather than Build applications. The promise of both EJB and COM+ (and now J2EE and .NET) was that you'd be able to buy at least major subsystems and plug them together with a lot less effort than writing them from scratch. This has turned out to be laughably not true, at least insofar as behind-the-server enterprise class applications go. (Who cares about the web, it's just a better VT100 ;) IN-STOCK AT K-MART MATTERS.
Now, there's not a lot of that kind of software being vended that is Python-based AFAIK, but there is a kind of Python "Trojan Horse" here we can exploit to sneak in when they're not looking - it's OK, they'll thank us later. There are two problems every large IT shop has. These problems never go away and anyone who helps solve any part of them will be a Hero. These problems are: 1) Making the old applications talk to each other and to new media like Da Web and Mobile. 2) Normalizing data for exchange between and among old and new apps. For you XML-weenies: XML, in-and-of-itself, cannot do this no matter how many times you say "semantic markup". The data I'm talking about it domain specific, requires human intelligence to understand and encode in the first place. XML will help, but we need specialized loaders and tools to do all the heavy lifting. Crack some part of these two problems - and Python is ideal for both, so long as the performance issues don't get in the way - and you'll live Happily Ever After - or until the CTO starts losing at golf and needs another "win".
I worry about one thing and one thing only in the Python world. It is something I have witnessed in every new technology I've ever seen. Python is dangerously close to becoming a victim of Feeping Creaturism - not so much In Fact, but rather in this community's mindset of forever wanting to fiddle one more feature into the language. This will Kill Python commercially if it happens. Too many variations on the core language theme make deployment and management of real systems too expensive. I, for one, would like to see a date picked for a permanent moratorium on the language proper, after which, only bug fixes and new modules could be added. After that date, language changes would have to be part of some new language ("Grail"?) which would owe no allegiance to Python at all.