Monday, December 19, 2005

A spirited defense of XML that I posted tonight on the Daily WTF:

People hate XML because it isn't a rigorous standard, but they haven't heard of any of the higher-level standards that are implemented *in* XML. People hate XML because the entire stream has be read before they can do anything with it, but they've never heard of SAX. People hate XML because they don't like implementing schemas as DTDs, but they've never heard of XSD. People think XML is ridiculous because it's "BASED OFF OF HTML", but they've apparently never heard of SGML.

The WTF here is that people are shooting their mouths off about technologies they *clearly* don't understand.

XML, in and of itself, is a simply, loose, generic, open standard that other standards can be implemented upon. Yeah, if you're Billy Joe Bob implementing BJBML, you're not going to get much benefit in the way of interop, granted. But if you're the W3C, or a coalition of companies involving, say, IBM or Adobe, it makes a lot of sense to base your standard on XML. You get an established syntax, which means you already a bunch of parsers, validators, apis, editors, and more to deal with your data. And, because of all that, converting data into or out of your format from other XML-based formats is relatively simple. Hell, even if you are Billy Joe Bob, you still get all those advantages with your format -- you just don't get the audience, unless your standard is pretty good and catches on.

This is why something like SVG is cool even though OMFG IT'S TEH XML!!! You can pull data from somewhere in XML, run it through XSL to merge it with SVG you've written, and dynamically generate complex graphics, animations, or UIs. That can be extended even further if you're using those graphics within, say, a DocBook or DITA project. Your data is cleanly separated from your business and presentation logic, you only need a small toolset, all of it open-source, with a consistent underlying syntax, and it will work on almost every platform you can imagine. And you can even process it in a SAX event pipeline.

Of course, it's not perfect. Like any other technology, it has its limitations and there are plenty of ways to abuse it. In particular, it's not great for encapsulating a lot of binary data, and it can sometimes be overkill if you're working with a pretty flat data structure that's not going to be used outside of your application (for example, a java properties file.) And, like in today's example, it's completely pointless if you're just going to embed giant CDATA blocks in it.

As far as the syntax, yeah, it's wordy. I'd imagine that most programmers here can appreciate the value in being able to edit XML as plaintext, and that wordiness can be useful in keeping things straight for the degree of nesting that can happen when you're trying to model even moderately complex data in a tree structure. If you don't like that, though, there are plenty of editors out there that will streamline things for you. And yeah, XML isn't lean, but if all we wanted was lean we'd be writing everything in assembler. Portability costs, deal with it. Storage space is cheap, memory is cheap, processors are cheap, bandwidth is getting pretty cheap. Parsing, storing, and transmitting XML are generally pretty painless here in the 21st century, unless you're dealing with truly massive amounts of data, in which case you should be using an XML store to deal with all the pain for you.

0 comments: