Back at the start of 2007, the bods at W3C announced the birth of a new language. Its name was to be XQuery and it was, they said, to be to XML what SQL is to databases.

Whoa, hold a second. That’s a huge claim. That’s kinda like taking your toddler for their first piano lesson and announcing to the class that they’re going to be the next Mozart.

XML, since its start back in the mid-nineties, had taken over the world. The beauty of XML was that it was a simple markup language that could contain all sorts of data in a structure that was not constrained by too many rules. However, the problem with XML was that it was a simple markup language that could contain all sorts of data in a structure that was not constrained by too many rules. XML had become the sock drawer of data. What had started out as a nice and orderly thing had become (potentially) a huge jumble of barely-readable data. Something had to be done.

Something had been done. It was called XPath – and to understand XQuery we must first become friends with XPath.

Nodes

The XML Path Language – XPath to its friends – is a language that enables you query XML files by travelling along – and selecting – their nodes. And what’s a node? Take the following XML file as an example:

<?xml version="1.0" encoding="UTF-8"?>

<friends>
  <characters>
    <character status="Lead" actorid="001">
      <firstname>Ross</firstname>
      <lastname>Geller</lastname>
      <gender>Male</gender>
      <birthdate>1967-10-18</birthdate>
      <occupation>Palaeontologist</occupation>
    </character>
    <character status="Lead" actorid="002">
      <firstname>Phoebe</firstname>
      <lastname>Buffay</lastname>
      <gender>Female</gender>
      <birthdate>1965-02-16</birthdate>
      <occupation>Masseuse</occupation>
    </character>
    <character status="Secondary" actorid="007">
      <firstname>Janice</firstname>
      <lastname>Goralnik</lastname>
      <gender>Female</gender>
    </character>
  </characters>
  <cast>
    <actor id="001">
      <firstname>David</firstname>
      <lastname>Schwimmer</lastname>
      <birthdate>1966-11-02</birthdate>
    </actor>
    <actor id="002">
      <firstname>Lisa</firstname>
      <lastname>Kudrow</lastname>
      <birthdate>1962-07-30</birthdate>
    </actor>
    <actor id="007">
      <firstname>Maggie</firstname>
      <lastname>Wheeler</lastname>
      <birthdate>1961-08-07</birthdate>
    </actor>
  </cast>
</friends>

View friends.xml in a separate window

In this document, <friends> is the root element node; <character>, <lastname> and <birthdate> are some of the document’s element nodes; and Ross, Male and Masseuse are examples of its atomic values. And finally, status is an attribute, as is actorid.

The various nodes in a document are related to each other as Parents (<character> is the parent of <firstname> and <gender>), Children, Siblings (<firstname> and <gender> are siblings), Ancestors and Descendants.

So basically, XPath allows you query a document by zip-lining down its nodes and reporting on its atomic values. The next question, obviously, is how.

Syntax

If you can read a computer’s file path then you can read and write XPath. This is because the language was intentionally designed to ape the structure of a file path. Look, I’ll show you what I mean.

/ Selects from the root node
// Selects matching nodes starting from the current
node
. Selects the current node
.. Selects the parent of the current node
@ Selects an attribute

XPath bolsters this simple syntax with additional functionality: predicates, operators and functions.

Let’s start off by taking a look at the operators. They’re pretty much what you’d expect.

+ Addition
Subtraction
* Multiplication
div Division
= Equal
!= Not equal
< Less than
> Greater than
| Computes two nodes.
or Or
and And

And here are some of the more important functions that you probably need to know.

last() Returns the number of items in the node
position() Returns the position of the node that is being,processed
contains(a,b) Returns true if a contains b. Otherwise it returns false.
starts-with(a,b) Returns true if a starts with b (where both a and b are strings).,Otherwise it
returns false.
count(node-set) Returns a count of nodes in the parameterised,node-set.
current-date() Returns the current date.
text() Returns the node of text type (or,
put simply, returns the text value of an element).

Actually, it’s pompous of me to claim that these are the important functions that you need to know. These are just a few of the ones I use; there’s a whole load more that you might want to check out. Here’s a link – but don’t click it now; stay with me. I wanna tell you about predicates.

XPath predicates are pretty much the equivalent of the SQL where clause; they’re how you filter your resultset, weeding out the items you’re not interested in. XPath predicates are always encased in square brackets, and are expressions that can take advantage of the functions and operators that we’ve just been talking about. Let me show you.

If, for instance, you wanted to find out who the first friend in our XML file is, you’d write something along these lines:

/friends/characters/character[1]/firstname 

But if you wanted to know their first name and their last name, you’ll need to make use of one of those operators we listed above.

/friends/characters/character[1]/firstname|//lastname

Let’s build on our example. Let’s say we wanted to know the first two friends in our file. We’ll need to use a function in our predicate to find that out.

/friends/characters/character[position()<3]/firstname|//lastname

Let’s find a way to complicate our predicate a little. If, for example, you wanted to find out the first secondary character in our XML file, you might write something like this:

/friends/characters/character[@status = "Secondary" and position()=1]/firstname|//lastname

Even at our beginner’s level, XPath can get a little unwieldy and complicated. Let’s say, for example, we wanted to find out the name of the first secondary character and the name of the actor that portrayed them. I don’t know why we’d want to do that, but go with me on this.

/friends/characters/character[@status = "Secondary" and position()=1]/firstname|//lastname|/friends/cast/actor[@id=/friends/characters/character[@status="Secondary" and position()=1]/@actorid]/firstname|//lastname

Actually, we’re not quite finished. Our path will give us the names we want, but they’ll be all wrapped up in their XML tags; what we really want is just the text. To get that we’ll also need to call the text() function.

/friends/characters/character[@status = "Secondary" and position()=1]/firstname/text()|//lastname/text()|/friends/cast/actor[@id=/friends/characters/character[@status="Secondary" and position()=1]/@actorid]/firstname/text()|//lastname/text()

As I’m sure you can now see, it doesn’t take much for XPath to begin to stretch out and become harder and harder to read. And that’s partly what sent the W3C guys back to the drawing board, leading to that announcement in 2007 and the birth of XQuery.

XQuery, however, wasn’t a wholesale replacement of XPath: XPath is excellent at the things it does well; it’s an easily-learned, easily-read language for simple XML queries. It’d be a shame to get rid of it. And so what they did instead, was subsume XPath into XQuery, making all XPath syntax valid XQuery (XQuery isn’t its only parent; it shares custody of XPath with something called XSLT, but we won’t go into that now).

No, what they did with XQuery was introduce a syntax that was better suited for handling more complex queries, and one that is structured better for SQL people – database developers – to understand.

Speaking about databases, you may have noticed how this article, so far, has made no mention of Oracle or of the role databases play in all of this. Strictly speaking, databases do not need to play any role when it comes to XPath and XQuery; they’re designed to read XML, and XML can live quite happily outside of a database. However, we can also load XML into our Oracle database and use XQuery to interrogate it. In the next article in this series, we’ll talk about how XPath and XQuery are implemented in Oracle.

References:

XPath tutorial
Learn XQuery in 10 Minutes
XPath

Tags: , , , ,