SAX and DOM and Rock'n Roll (4/4) - exploring XML
SAX and DOM and Rock'n Roll
Simplistic XPath
Earlier on we learned that the DOM gives us the ability to access the tree structure that underlies the parsed document. What we ideally need for our RSSApplet is a function that lets us address a specific element in the tree and have the content returned. For tree navigation, the W3C defined the XPath specification, allowing us to specify nodes in a manner similar to the way we would specify files in a hierarchical directory structure. Remember column2?
I added simple XPath functionality to my Node class, which represents nodes in the parsed document tree:public Node find(String path, int occur) { Node n = this; JSArray a = new JSArray(); a.split(path, "/"); int i = 0; while (iThis function finds the nth occurrence of a node matching the given path expression. See column2 for a list of valid expressions. The method splits the complex path into its elements (delimited by a slash) and successively calls findChildElement() to gradually resolve the full path to the specified node. If the path cannot be resolved a null node is returned.
Node findChildElement(Node parent, String simplePath, int occur) { JSArray a = parent.contents; Node n; int found = 0; int i = 0; String tag; do { n = (Node)a.elementAt(i); ++i; tag = (n.name != null) ? n.name : ""; int colonPos = tag.indexOf(':'); tag = (colonPos == -1) ? tag : tag.substring(colonPos + 1); if (simplePath.equals(tag)) ++found; } while (iThis function is the actual workhorse in the tree navigation. It iterates over a set of sibling nodes and returns the nth occurence of the specified simplePath, or null if not found. Namespace prefixes are stripped from the search. These two functions are all that is needed to implement basic search capabilities for elements in an XML tree. Of course the specification has much more to offer and full implementations are much more sophisticated than this here, but for us it gets the job done nicely. See the full commented source code.
The modified RSSChannel
The RSSViewerApplet has not changed at all, since the interface of RSSChannel that it relies on stayed the same. Nevertheless the implementation of the channel has changed a bit:
private String readChannel(String srcURL) throws Exception { URL u = new URL(srcURL); InputStreamReader r = new InputStreamReader(u.openStream()); StringBuffer sb = new StringBuffer(); int c; while ((c = r.read()) != -1) { sb.append((char)c); } return sb.toString(); }This time we read the whole file into one string, which surely is only feasible when the file size is limited, which usually is the case with RSS.
public void load(String srcURL) throws Exception { root = new Xparse().parse(readChannel(srcURL)); channelTitle = root.find("RDF/channel/title", 1).getCharacters(); channelLink = new URL(root.find("RDF/channel/link", 1).getCharacters()); channelDescription = root.find("RDF/channel/description", 1).getCharacters(); imageTitle = root.find("RDF/image/title", 1).getCharacters(); imageLink = new URL(root.find("RDF/image/link", 1).getCharacters()); imageURL = new URL(root.find("RDF/image/url", 1).getCharacters()); items = new Vector(); int pos = 0; while (true) { Node n = root.find("RDF/item/title", pos+1); if (n == null) break; items.insertElementAt(n.getCharacters(), 2*pos); n = root.find("RDF/item/link", pos+1); if (n == null) break; items.insertElementAt(new URL(n.getCharacters()), 2*pos+1); ++pos; } }Loading the channel now means having XParse create the XML document and then locating the right nodes in the resulting tree, using find() and getCharacters(). The rest remains unchanged.
Conclusion
We successfully put the RSSViewerApplet on a diet by changing the used programming style from event-driven to object-based. We discussed the different advantages and challenges you have to trade off when processing XML. Even if you do not intend to program your own XML processor you should still understand the concepts so you know which tools to choose for which job. You have seen what an impact it can have on a tool as little as our RSSViewerApplet. Or then again, can you see it? Here it is for you in source and binary form.
Produced by Michael Claßen
All Rights Reserved. Legal Notices.URL: https://www.webreference.com/xml/column11/5.html
Created: Apr. 18, 2000
Revised: Apr. 26, 2000