SAX and DOM and Rock'n Roll (4/4) - exploring XML

SAX and DOM and Rock'n Roll

Simplistic XPath

Earlier on we learned that the DOM gives us the ability to access the tree structure that underlies the parsed document. What we ideally need for our RSSApplet is a function that lets us address a specific element in the tree and have the content returned. For tree navigation, the W3C defined the XPath specification, allowing us to specify nodes in a manner similar to the way we would specify files in a hierarchical directory structure. Remember column2?

I added simple XPath functionality to my Node class, which represents nodes in the parsed document tree:

  public Node find(String path, int occur) {
    Node n = this;
    JSArray a = new JSArray();
    a.split(path, "/");
    int i = 0;
    while (i 
This function finds the nth occurrence of a node matching the given path expression.
See column2 for a list of valid expressions.
The method splits the complex path into its elements (delimited by a slash) and
successively calls findChildElement() to gradually resolve the full path to the
specified node. If the path cannot be resolved a null node is returned.
  Node findChildElement(Node parent, String simplePath, int occur) {
    JSArray a = parent.contents;
    Node n;
    int  found = 0;
    int i = 0;
    String tag;
    do {
      n = (Node)a.elementAt(i);
      ++i;
      tag = (n.name != null) ? n.name : "";
      int colonPos = tag.indexOf(':');
      tag = (colonPos == -1) ? tag : tag.substring(colonPos + 1);
      if (simplePath.equals(tag)) ++found;
    } while (i 
This function is the actual workhorse in the tree navigation. It iterates over a set of
sibling nodes and returns the nth occurence of the specified simplePath, or null
if not found. Namespace prefixes are stripped from the search. These two functions are all
that is needed to implement basic search capabilities for elements in an XML tree. Of course
the specification has much more to offer and full implementations are much more sophisticated
than this here, but for us it gets the job done nicely. See the full 

commented source code.
The modified RSSChannel
The RSSViewerApplet has not changed at all, since the interface of RSSChannel that it 
relies on stayed the same. Nevertheless the implementation of the channel has changed a bit:
  private String readChannel(String srcURL) throws Exception {
    URL u = new URL(srcURL);
    InputStreamReader r = new InputStreamReader(u.openStream());
    StringBuffer sb = new StringBuffer();
    int c;
    while ((c = r.read()) != -1) {
      sb.append((char)c);
    }
    return sb.toString();
  }

This time we read the whole file into one string, which surely is only feasible when the
file size is limited, which usually is the case with RSS.
  public void load(String srcURL) throws Exception {
    root = new Xparse().parse(readChannel(srcURL));
    channelTitle = root.find("RDF/channel/title", 1).getCharacters();
    channelLink = new URL(root.find("RDF/channel/link", 1).getCharacters());
    channelDescription = root.find("RDF/channel/description", 1).getCharacters();
    imageTitle = root.find("RDF/image/title", 1).getCharacters();
    imageLink = new URL(root.find("RDF/image/link", 1).getCharacters());
    imageURL = new URL(root.find("RDF/image/url", 1).getCharacters());
    items = new Vector();
    int pos = 0;
    while (true) {
      Node n = root.find("RDF/item/title", pos+1);
      if (n == null) break;
      items.insertElementAt(n.getCharacters(), 2*pos);
      n = root.find("RDF/item/link", pos+1);
      if (n == null) break;
      items.insertElementAt(new URL(n.getCharacters()), 2*pos+1);
      ++pos;
    }
  }

Loading the channel now means having XParse create the XML document and then locating the 
right nodes in the resulting tree, using find() and getCharacters().
The rest remains unchanged.
Conclusion
We successfully put the RSSViewerApplet on a diet by changing the used programming style from
event-driven to object-based. We discussed the different advantages and challenges
you have to trade off when processing XML. Even if you do not intend
to program your own XML processor you should still understand the concepts so you know
which tools to choose for which job. You have seen what an impact it can have on a tool
as little as our RSSViewerApplet. Or then again, can you see it?
Here it is for you in source and binary form.

  


  
  
  
  
  Produced by Michael Claßen

  All Rights Reserved. Legal Notices.
  URL: https://www.webreference.com/xml/column11/5.html

  Created: Apr. 18, 2000

  Revised: Apr. 26, 2000

SAX and DOM and Rock'n Roll (4/4) - exploring XML

SAX and DOM and Rock'n Roll

Simplistic XPath

The modified RSSChannel

Conclusion

Find a programming school near you