Xparse-J Update 1.1 (1/2) - exploring XML | WebReference

Xparse-J Update 1.1 (1/2) - exploring XML

Xparse-J Update 1.1

Xparse-J grew out of the need to come up with the smallest possible XML parser to be plugged into the RSSViewerApplet. While the parser seems to work great in the parsing phase, it is apparent that accessing the parsed content afterwards is not as comfortable as it could be.

The XML document is parsed into a tree of Nodes which can be navigated using the standard elementAt(int position) method of JSArray. Since this navigation is fairly awkward, Node provides a find() method to locate nodes at a certain position in the XML tree.

This Node.find() was not very generic so far, it only worked properly for the scenario needed in the RSSViewerApplet, which was to extract an RSS channel's title, description and items. With more and more people using Xparse-J for other projects this deficit became more and more obvious.

While a full DOM and XPath implementation is beyond the scope (and size) of a small XML parser, a limited version of XML path specifications should be provided for more conveniently locating nodes in the XML document tree.

So far Node.find() had as arguments:

The problem here is that an occurrence parameter would need to be added to every element of the path in order to be generic and unambiguous: Does Node.find("/item/title", 2) denote the first title of the second item, or the second title of the first item? In the case of RSS the answer is obvious as their should be only one title per item, but in arbitrary XML documents this is not so clear.

The desired functionality is something equivalent to the XPath expression "/item[x]/title[y]", where x and y specify the respective occurrence for each element of the path expression. While implementing this syntax is certainly feasible (what isn't in software?) the parsing code for these expressions would once again add unpleasant weight to the parser.

A compromise was found in changing the occurrence parameter from int to int[], from a single to an array of integers:

This disambiguates the aforementioned example by making it possible to distinguish Node.find("/item/title", {1, 2}) from Node.find("/item/title", {2, 1}). The former tries to find the second title element in the first item (which should not exist in RSS) whereas the latter correctly finds the title of the second item.

Note that this is equivalent:

Splitting the occurrences from the path string simply saves some extra parsing code for separating them out again in software.

Let's look at how the code was changed.

Produced by Michael Claßen

Created: Aug 01, 2001
Revised: Aug 01, 2001