SAX and DOM and Rock'n Roll (3/4) - exploring XML
SAX and DOM and Rock'n Roll
SAX vs. DOM
The DOM is a quite convenient way to access and manipulate XML data, but it comes with a price to pay:- The underlying XML needs to be fully parsed before processing can occur. As most DOM implementations are purely memory-based, this limits the amount of XML data that can be processed this way. Also the possiblity for pipelining various stages of processing is limited.
- The DOM structure only defines generic nodes, whereas most languages are strongly typed so that one might want to map specific nodes to specific classes of say Java code.
SAX is more useful in cases where:
- Huge amounts of XML need to be processed, but the information needed is highly local, meaning only a small amount of data needs to be stored. This is usually the case in transforming linear documents, where little cross-linking occurs.
- Various stages of XML processing are interconnected to form a pipeline. The next stage can then begin its work as soon as the first character comes out of the previous one, instead of having to wait for the full document to be converted into objects.
Some projects have tried to improve on DOM in this respect. The Java community in particluar has looked closely at ways to tie specific XML elements to specific Java classes. Popular efforts include:
- Quick, and its predecessors MDSAX and Coins
- XML Data Binding
- Xbeans
XParse
XParse is an XML-compliant parser written in less than five kilobytes of Javascript. It takes a Javascript string of XML and converts it into a Javascript array representing the object model, one array element per XML element. For more information see Jeremy's Web site. I took the liberty of adapting this fine piece of software to Java. Since the syntax of these two languages is fairly similar, it was not too difficult. In fact, the main challenge lay in mimicking Javascript objects, especially their dynamic behavior for attaching new properties and accessing arrays.
XParse for Java
The translation of XParse from Javascript to Java was as simple as commenting out three lines of unreached return statements in the code. The commented source can be inspected at SourceForge.
JSArray
The Javascript code in XParse makes heavy use of the language's built-in array type JSArray, so the easiest strategy was to write a corresponding Java class that mimics the behavior of the Javascript data structure, at least to the extent needed in this specific case.
Without going into too much detail the key feature of a JSArray is, not surprisingly, the ability to hold a set of objects referenced by indices. Furthermore, it is possible to directly address the contained object's properties by specifying array[index].property. This behavior is approximated by checking for specific object types and property names and then "manually" setting the property on the object. A fully generic implementaion would be feasible through the use of Java's reflection capabilities. Another powerful feature is the ability to split a string into an array using a certain delimiter within the string; similarly, in reverse, joining an array of strings back into one including a potentially different delimiter string. If you are familiar with perl this feature should sound familiar. See the source code with comments if you like.
Produced by Michael Claßen
All Rights Reserved. Legal Notices.
URL: https://www.webreference.com/xml/column11/4.html
Created: Apr. 18, 2000
Revised: Apr. 26, 2000