WebReference.com - Part 1 of Chapter 10 from Professional PHP4 XML, from Wrox Press Ltd (4/7)
[previous] [next] |
Professional PHP4 XML
Using SAX for XML Transforming
SAX is an event-driven parser for XML. PHP uses Expat which is a SAX parser written by Jim Clark. In this section we will see how to use SAX for XML transformations.
SAX is described in Chapter 5 of this book.
Why SAX?
SAX breaks the XML file into smaller chunks of data (buffers of raw data) and then parses them, so the amount of memory consumed by SAX is constant, no matter how big the XML file is. This also makes SAX faster than DOM or XSLT, and it scales better than these standards for large files, no matter what we are doing with the document.
How To Use SAX for Transformations
The first intuitive approach for using SAX to transform XML documents is to write handlers for the SAX events, and, keeping context between the handlers, perform the transformation inside them. In our example we can think about the following solution:
- Step 1--Initialize a flag to 0
- Step 2--Define the handlers for SAX events:
StartElementHandler
If the element is<name>
turn a flag on (1
)EndElementHandler
If the element is<name>
turn the flag off (0
).CharacterDatahandler
If the flag is on (1
) then output uppercased data, else output data- Step 3--Parse the document
We are assuming that no subelements may appear inside a <name
> tag (no mixed content). If subelements do appear then we can use an integer flag
and increment or decrement it. This way we can keep track of the text of a <name>
tag and not its subelement <name>
.
We just wrote the sketch of a SAX parser that transforms our XML file. We can easily write this using the PHP XML parser functions, as we learned in Chapter 5, and the result will be the desired transformation. While this works, it is not a very good long-term solution.
After writing applications that use SAX we'll learn that complex processing becomes a nightmare due to the limitations of the SAX parser. We will not feel very comfortable if the parser only tells us when an element starts, or when an element ends. As processing becomes more complex we will have to write a lot of code to keep context using stacks, push and pull flags, use integer semaphores, and many other obscure programming techniques.
Imagine a situation where we have to perform a complex transformation. It may be the worst coding nightmare ever, and even if we succeed, it will still be difficult for us to understand the code after some weeks. Every programmer has to be extra careful when faced with such situations. As programmers we know that the best way to address a complex problem is to break it into pieces, solve each piece, and then unite all of them in the end. This is called modularization or factoring. Modularization of SAX driven applications is crucial whenever the task is complex. We can do this using SAX filters.
[previous] [next] |
Created: August 12, 2002
Revised: August 12, 2002
URL: https://webreference.com/programming/php/php4xml/chap10/1/4.html