WebReference.com - Part 1 of Chapter 10 from Professional PHP4 XML, from Wrox Press Ltd (5/7)
[previous] [next] |
Professional PHP4 XML
SAX Filters for XML Transformations
We have learnt a lot about SAX and filters in Chapter 5. In this chapter we touch on this concept of filters, and some techniques to use them. It is expected that PEAR will define a standard for SAX filters in PHP in the near future. In the meantime we can use the approach discussed in Chapter 5, or our own approach, provided we understand the concept well.
A SAX filter is a class that performs transforming from the output of a SAX parser or another SAX filter. A filter can encapsulate a simple transformation, so we can write many filters and chain them later. Passing the output of one filter as an input to another creates a complex transformation from simple ones.
We'll be building a very simple example to illustrate these concepts. Let's now write the class.
The AbstractSAXParser
receives an XML file and produces SAX events that must be passed to a filter. The SAX parser alone is useless: it is used as a generator of SAX events for another class.
The abstract class is important as it helps us to understand the methods that implementations must have.
Here are some observations:
- The class hides the SAX parser from the application
If we change the parser from Expat to another one, nothing will have to be changed except this class. - The class allows parsing non-XML data
We can write anAbstractSAXParser
for non-XML data, parse the data with it, and produce SAX events. This way we can transform non-XML to XML. - The class can hide the XML source from its users
We can write a SAX parser for XML files, XML strings, and for XML stored in a database.
The abstract class is as follows:
class AbstractSAXParser
{
var $listener;
function AbstractSAXParser() {}
function ParserSetOption($opt, $val) {}
function SetListener($obj)
{
$this->listener = $obj;
}
function StartElementHandler($parser, $name, $attribs)
{
$this->listener->StartElementHandler($name, $attribs);
}
function EndElementHandler($parser, $name)
{
$this->listener->EndElementHandler($name);
}
function CharacterDataHandler($parser, $data)
{
$this->listener->CharacterDataHandler($data);
}
function Parse() {}
}
The listener will be the object that will receive the SAX events produced by this class. As we can see, the handlers in this class are fixed, and they only call their brothers in the listener object. The abstract parse method will depend on the implementation, and is supposed to parse the XML data. It generates the events that are intercepted by the methods in this class and calls the same methods in the listener. Note that the listener methods don't receive a parser since they don't need it. The SetListener()
method allows us to set the object that will be used as a listener for the events produced in this class. Finally the ParserSetOption()
method is prepared to set options specific to the SAX parser, such as case folding.
Note that for space-saving reasons only
startElement
,endElement
, andcharacterData
events are considered. It is easy to extend the example to the full range of SAX events with very minor changes.
We can write a class for the Expat parser, as follows:
class ExpatParser extends AbstractSAXParser
{
var $parser;
var $filename;
var $buffer;
var $error_string;
var $line;
function ExpatParser($xmlfile)
{
$this->filename = $xmlfile;
$this->parser = xml_parser_create();
$this->buffer = 4096;
xml_set_object($this->parser, &$this);
xml_set_element_handler($this->parser, "StartElementHandler",
"EndElementHandler");
xml_set_character_data_handler($this->parser,
"CharacterDataHandler");
}
function ParserSetOption($opt, $val)
{
return xml_parser_set_option($this->parser, $opt, $val);
}
function Parse()
{
if (!($fp = fopen($this->filename, "r"))) {
return 0;
}
while ($data = fread($fp, $this->buffer)) {
if (!xml_parse($this->parser, $data, feof($fp))) {
$this->error_string =
xml_error_string(xml_get_error_code($xml_parser));
$this->line = xml_get_current_line_number($xml_parser);
die("Error: ".$this->error_string." on ".$this->line);
}
}
xml_parser_free($this->parser);
return 1;
}
}
This class is just an implementation of the abstract class using the Expat parser. Some methods such as getErrorString()
and getErrorLineNumber()
are missing, but we won't need them for our examples.
Now we define an AbstractFilter
class. The abstract filter receives SAX events, manipulates them, and re-transmits the events to a listener. Also, the filter can change tag names, modify the content of elements, add elements by generating events, and so on.
Here is the abstract class:
class AbstractFilter
{
var $listener;
//to set the listener of the filter and handlers for the events
function SetListener($obj)
{
$this->listener = $obj;
}
function StartElementHandler($name, $attribs) {}
function EndElementHandler($name) {}
function CharacterDataHandler($data) {}
}
The
SetListener()
method sets the listener of the filter and the handlers for the events. In the handlers we have to perform a task and then pass the event to the next listener.
We will use the same example that we used earlier in this chapter, and write a filter to convert the text inside <name>
tags to uppercase:
class FilterName extends AbstractFilter
{
var $flag = 0;
function StartElementHandler($name, $attribs)
{
if (strtolower($name) == "name") {
$this->flag = 1;
} else {
$this->flag = 0;
}
$this->listener->StartElementHandler($name, $attribs);
}
function EndElementHandler($name)
{
if (strtolower($name) == "name") {
$this->flag = 0;
}
$this->listener->EndElementHandler($name);
}
function CharacterDataHandler($data)
{
if ($this->flag) {
$data = strtoupper($data);
}
$this->listener->CharacterDataHandler($data);
}
}
Note that we used the simple approach we described before, that is, using a flag
to track when we are inside a <name>
element and when we are not. Also notice how the listener is passed the events immediately after the modification is done. The listener will receive the same events as this filter but it will receive uppercased text for <name>
elements.
[previous] [next] |
Created: August 12, 2002
Revised: August 12, 2002
URL: https://webreference.com/programming/php/php4xml/chap10/1/5.html