JSON and other data serialization languages

The easiest way to speed up an Ajax application is to take out the 'X' and use JSON rather than XML. Of course, it isn't that simple, as William Brewer explains, but JSON, and YAML, are fascinating solutions to the old problem of transferring complex data between modules, services and applications, nonetheless.

JSON rides the XMLy YAML

I suppose it shouldn’t be an issue. The way to represent, store, and transfer object data is XML. Everyone seems to be agreed on that point.

At first glance, one might be surprised to think that a subset of a document markup language has become the standard way of moving complex data structures between objects and databases. However, XML is versatile. It has been used to create, by sheer effort and determination, a document format, a message packet format, a secure RPC channel (SOAP or XML-RPC.), a data format, and an object database. XML would seem to be the obvious choice for a data-serialization language, providing the means by which data is transported between modules, services and applications. There are so many tools around for XML that it would seem eccentric to question its supremacy. It is now an intrinsic part of SQL Server. Why should anyone ever bother with anything else?

Why avoid existing standards?

The problem is that XML doesn’t do ‘simple’. Often, all that the application designer actually wants to have is a programming-language-neutral, platform-agnostic, and transport-independent interchange mechanism for structured data (simple data types, associative arrays and lists). This is frustrating since XML should be ideal. Your data is tagged. You have given it semantic meaning. You can extend your xml data file with completely different data without the problem of compatibility. You can render and style your data. The trouble is that all this complexity and versatility comes at a cost. XML isn’t fast. When AJAX programmers hit the performance problem of digesting XML, they turned to JSON and found, at that time, that it was around a hundred times quicker to process a 10K JSON data file in the browser than an XML file.

XML itself isn’t intrinsically a cross-platform serialization format. It can be turned into one, but you are forced to invent your own format using XML first. WDDX (Web Distributed Data eXchange) came close to creating an XML serialisation format, but it has been abandoned (its envelope/ header/ body structure was later adopted by SOAP, along with its transport and interaction neutrality). XML-RPC and SOAP are sometimes suggested as being appropriate but they are full-blown RPC mechanisms with transports. To use them merely for serialisation of structured data is overkill.

Why should this matter? Normally it doesn’t. The .NET tools for handling XML are highly developed, to the point where using XML on adequate hardware and network is as quick as anything else. There isn’t a bottleneck under these circumstances. The problem lies in the new browser-based applications that are pulling data dynamically across the web. It is going to get worse when the Semantic Web kicks in. Developers seldom use low-powered machines or low-bandwidth connections, so are less aware of the importance of performance on the users’ machine. They tend to just shrug and suggest that you buy a better CPU, or more memory. You can’t do that if you don’t ‘own’ the desktop device. Even if the difference in time to assimilate XML and JSON data is down to a factor of 10, it can make all the difference to the perceived performance of a browser-based application, particularly where you can’t insist on minimum hardware requirements. You can solve design issues on the server end of an application by souping up the hardware, but you can’t do that at the client end.

JSON, an expedient solution

Many AJAX programmers, myself included, hit a big problem with the performance of data-intensive applications, and the problems were solved by using JSON.

JSON (short for JavaScript Object Notation) is still very much in use. Google Data APIs and Yahoo Web Services can both serve data in JSON format. SPARQL will produce its results in JSON format.

It has been described as the ‘Fat-free alternative to XML’. It provides a light-weight standard for representing the core data types by adopting a subset of the ECMAScript (JavaScript) standard. Unlike XML, it doesn’t specify the character set, but assumes a Unicode representation of the data (defaulting to 16-bit). It represents just the data, not the transport. It is very easy to read and understand. It is actually a subset of the ‘object literal notation’ of Javascript, with some extra limitations to make it easier to implement a parser. It is an elegant shorthand syntax for encoding generic data structures in JavaScript, and so can be executed.

The client must, obviously, either have full trust in the server, pre-check with a Regex for well-formedness or use a validator, because the JSON is actually executable code that will be evaluated by the JavaScript engine.

Here is a simple SQL Call that produces an XML result.

Produces XML like this

… whereas the JSON contains …

But was achieved with the delightfully simple SQL

Of course, you will need to have a user-defined function to ‘escape’  the strings, but that is not too taxing.  In practice, producing JSON code from a SQL result is even more simple than this. simple. All you need to do is to output in XML, and let the application layer convert the results with something like Json.NET. This tool will also allow you to convert LINQ to JSON, quickly convert .NET objects to JSON and back again and produce well formatted, indented JSON for debugging or display.  Getting data back uses the same easy route.  JSON can be converted to XML, and SQL Server can then digest the XML.

You’ll notice rather more difference between this JSON

And this XML:

… in that we have an array, delimited by [] (an example adapted from the example on the JSON site)

The JSON is very clear. It doesn’t take a lot of figuring out. Show JSON data to anyone and it doesn’t take long for them to figure it out. What is lacking when you compare it with the XML is the namespace information, of course. With JSON, both ends of the communication need to know about the data being transmitted.

In JavaScript, an object is just an associative array. JSON takes that syntax and makes it generally applicable. JSON syntax is pretty intuitive, and close to object notation in several languages. There are encryptors and decryptors in every current computer language, and validators on every platform. JSON wasn’t invented as such; the object literal notation was merely discovered by developers and used to fill a requirement that wasn’t being met elsewhere. The fact that it bypassed the restrictive same domain policy of the XmlHttpRequest object was a real bonus for the spread of JSON.

JSON has two drawbacks

  • both sides of the transfer have to understand the schema.
  • It cannot represent relational data.

Because of these drawbacks, JSON has spawned a superset called YAML (YAML Aint a Markup Language). YAML (not to be confused with YAML – “Yet Another Multicolumn Layout”) allows you to represent both relational and object data, and is able to transfer the metadata. Its greatest importance to us is that it can represent relational data. YAML is primarily a data serialization language, like JSON. YAML is easier to read and more able to represent complex data models. The big problem with YAML is that it has not yet got the same maturity as JSON. It is dauntingly complex. It is a great piece of design but it hasn’t got the weight of users behind it in the same way as JSON. However, if you need to transfer relational data between database and your application’s objects, it is the only way to go. As a data-interchange format between different Relational databases, it would take some beating.

For more reading: