XML and RDBMS: 10 years on

As we approach the 10-year anniversary of XML, Jim Fuller provides a personal retrospective, focussing on how XML has been and will be used with the RDBMS.

“XML is the digital dial tone of the web” — Jon Bosak (The father of XML)

November 2006 will mark the 10th year of XML’s rise to prominence in a world that continues to be dominated by RDBMS and SQL. During this time, XML has caused noisy debate within every corner of our software development multi-verse. A recent XML-DEV discussion spurred me to review the reasons for its success, and especially how it intersected with past, and will intersect with future, uses of RDBMS.

Attractions of XML

Hard bitten SQL yeoman denigrated the performance characteristics of a half-baked hierarchical data model instantiated in a concrete syntax of angle brackets. The heir apparent SGML purists were thinking that XML was finally going to sort out publishing on the web – with the more esoteric imagining a way to hitch a ride to the stratospheric Semantic Web.

There were the unwashed many, like you and me, spending cycles on serving up data for web sites and wondering why we needed to do anything new. Then something surprising happened…XML started to be used in every conceivable form, propelling its status from the hyperbole of the ‘new’ to an essential tool in the kit of the perspicacious developer. To understand how this happened in a world full of tried and true solutions, I try to list some of the adoption characteristics that were attractive to me personally.

Easy to learn

People learn better when they have an existing analogy with which to ‘map’ a concept. For example, those familiar with HTML were one small step away from using and understanding XML. I should probably remind everyone that there are a *lot* more people who have heard and even seen HTML then those using tables in a relational database and it is only natural for them to seek the simplest ‘next step’ regardless if that next step is the right or wrong solution.Easy to make

All one needs to work with XML is a text editor. Finally an emacs lisper in the same room as a Dreamweaver designer has something other then HTML to talk about. ‘Data for the masses’ is a term I like to use, though in terms of design pattern speak…linga franca feels more appropriate.

 

Easy to debug

Finding what is wrong with your XML data is as easy as ‘view source’ in an HTML browser…easy to cogitate and faster then grokking some binary data format indirectly through some proprietary debugging tool

 

Informal and lightweight

No dependency on server side application software means that people like designers and business domain experts can get on without waiting on the overworked DBA to set something up for them. Admittedly few designers and even fewer business domain experts generate XML directly…though it was dead simple to get the tools they were working with to produce XML.

 

Unicode

Taking a ‘clue’ from Java and insisting on Unicode at its foundation meant that XML is prepared to be a more complete solution.

 

Existing technologies were immediately applicable and available

Technologies such as DOM and SAX were already parsing HTML in ‘anger’ and ready to be applied to XML

XML and the Web

There was a lot of semi-structured document type data being normalised into RDBMS tables at the time of XML birth (and still is). A lot of work was/is spent in marshalling data to and from the web and relational tables. There was benefit to having data marked up using XML as it could go over the wire and be stored as a document and generally ‘played nice’ in the web stratum. Using XML to take care of ‘document orientated data’ and RDBMS to manage ‘data orientated data’ became an architectural breakpoint in data modelling. Technologies such as XSLT also started to come into play; applying the ‘separate data from presentation’ mantra.

Barriers to XML adoption

At this point, I list some of the factors that should have been barriers to XML adoption:

  • Existing RDBMS technology is more performant by an order of magnitude(s) in most data scenarios
  • We were all just getting used to mapping RDBMS to OO in our code, not to mention a lot of useful tools were emerging to help us do this
  • XSLT functional approach was determined to be a ‘steep’ learning curve for those more procedurally minded…not to mention sending ‘SQL join freaks’ into a spin when they found out how difficult it was to do what they had already been doing easily using SQL.

On top of this, a phalanx of related technologies came riding on the back of the XML beast and simply failed to make an impact…

Where are the links?

It is still hard for me to think that one of the primary features that made HTML so popular would be ‘missing’ or woefully represented in efforts such as XLINK

Web Services distraction

All I can think about is ‘Enterprise cathedral building’ at its best when I see the SOAP stack and all its complementary bits. A lot of brainpower was drained into this bottomless pit with the most useful result being the rise of REST, which is essentially the web as we know it + XML.

To schema or not?

I have no doubt that activities like the W3C’s XML Schema will eventually inform us on how we will all validate and constrain our XML data…but for the time being one can opt in or out or use some other schema technology such as RelaxNG, XSLT, or schematron. Having the ability to have formal and informal data means you can choose which best suits the application requirements; e.g. a web Content Management system can live with informal data, whilst your companies finance systems will need a rigorous data definition.

Semantic Web distraction

The application of XML in the development of a smarter web ‘smells’ of the early 80’s and all the effort poured into such things as AI, genetic algorithms and LISP…it’s great to see the old become new again but I can’t help think but worry that a new generation will get seduced by the promise of the impossible.

Why XML gained a foothold

Ultimately, XML’s adoption was a probably a natural result of the unique process by which the original specification was created. A hard core of SGML old salts had absolute control over what went into the specification itself; though they allowed themselves to be informed by a layer of software developers to provide in depth comment, advice, tests, and use cases.

These software developers in turn opened up the review process to the public. These days this is the normal way in which specification bodies (even those who have hard commercial cores!) go about their duties, and mirrors Open Source software development in general. With no lack of experience and the ‘long memory’ that is SGML, along with groups such as Hytime, it shouldn’t surprise us that the XML specification has had an impact.

Note
XML was ‘fortunate’ to have been created near enough to the Internet bubble bursting; a lot of idle hands were ‘ready to serve’ the cause.

With all this said I think there are some deeper reasons why XML gained a foothold in a world dominated by RDBMS.

Multiple data models: hierarchical versus relational data

We already mentioned how XML helped characterise ‘document orientated data’ whereas RDBMS were good at managing ‘data orientated data’. XML can also be considered a good choice for representing hierarchical data, e.g. data in the form of a tree. In fact we use hierarchical data models all the time, all one has to do is interact with your OS file system to know that.

With the past 20 years developers have gotten used to working ‘at the interfaces’ of their code so working with multiple data models was and is nothing new, albeit still painful.

Hybrid approach: RDBMS is the anchor, XML is the sail…

Long term data storage requirements are different than, let’s say, the needs of a client application to query a subset of data. Having 2 approaches in the form of RDBMS and XML means you can better fulfill requirements.

Using RDBMS as the foundation of your data layer and XML for distributing data can be an effective technique. Though the performance and functionality of today’s RDBMS means that it won’t sit idly by as holder of a companies ‘crown jewels’ and not participate in any of the fun. At a minimum, the ability to marshal data back and forth from RDBMS and XML means that you can accommodate future integration requirements more ably.

The use of XML and RDBMS together represents a sophisticated hybrid approach to solving all your data problems and not having to make compromises.

Outside developments

“Memory is the new hard drive and hard drive is the new tape drive”

Let’s not forget that developments outside the world of software can have an impact on the architectural decisions we make. I have taken the above un-attributed quote as an example of how advances in the production of memory and hard drives is driving applications to ‘keep more in RAM’ and use cheap, robust hard drives to take care of long term storage. This statement is evidenced by the emergence of things such as ‘persistence layers’ in software development.

For example, there are many arguments about how efficient XML is as a text based encoding. These arguments may become moot with more processing power and cheaper storage. The point is that we should be aware that new technology can render our current assumptions invalid.

Where XML is going?

I would like to think that using XML *and* RDBMS represents a better way to solving problems. Inevitably there will those specialists who have invested considerable time and effort in one approach and will argue against the use of any other.

One exciting approach is the emergence of native XML Databases…I am a fan of the eXist-db (exist.sourceforge.net). Recently, I was on the selection committee for XML Prague 2006 (www.xmlprague.cz) and was able to get the core development team of eXist-db over to Prague. In my opinion, eXist-db has some impressive features:

  • Core level 1 XML:DB compliance
  • Efficient indexing
  • Full database recovery
  • XQUERY, XPATH, XSLT
  • Full DOM/SAX support
  • Multiple interfaces e.g. SOAP, XML-RPC, WebDAV
  • HTTP/REST API providing all data via its built in web server
  • Updates achieved using XQUERY UPDATE or XUPDATE
  • Unix-like access permissions and XACML for XQUERY access control

The challenge of using native XML Databases (NXD), such as eXist-db, relates to how easy it will be to ‘fit together’ with RDBMS implementations, without duplicate effort or unnecessary complexity. Perhaps we will see database vendors picking up a few tricks from the NXD crowd. I for one would rather work with a single database product, but it might be a while before this occurs.

Other developments illustrate just how good things can get when using RDBMS and XML together; one such development that springs to mind is Microsoft’s SQL Server Reporting Services and its XML based report objects, which:

  • Uses the XML-based Report Definition Language
  • Can perform XSLT transformations
  • Provides a sophisticated Web Services interface

Since SSRS is included for free in SQL Server 2005, I would highly recommend taking it for a spin to understand where convergence can really provide true benefit.

In any event, this November will mark 10 years that XML has been with us – just as long as “established” web technologies such as FLASH; it’s hard to believe!