WebReference.com - Part 1 of Chapter 10 from Professional PHP4 XML, from Wrox Press Ltd (3/7)

[previous] [next]

Professional PHP4 XML

Here we used a PHP class to hide the XSLT processor from the application. This is a key recommendation if we use XSLT. Separating the XSLT implementation from its use has the following advantages:

We can change the XSLT processor without changing the code that uses it. For example, we can write a wrapper class for Xalan C and test if Xalan performs worst, better, or equal as compared to Sablotron.
If the PHP XSLT extension API changes we won't need to modify all the sources.
We can use the class to abstract different sources of XML data for the XSLT transformation. In our class we can pass an XML document as a string or as a file, and by adding some more code, we could have passed a SAX parser, a DOM tree, or a handler to an XML database and performed the transformation there.
We can add a caching mechanism to the XSLT class making it transparent to applications. We can write the class to use/not use the caching mechanism according to the user preferences. This allows us to use caching at the level we want if some other class also caches.

It may sound trivial, but abstracting the XML processing tools from the application is a key design decision in a large project. The SAX parser, DOM extension, or XSLT processor should be encapsulated in wrapper classes to ease software maintenance. The PEAR project will surely provide such abstractions in the near future. Stay tuned to https://pear.php.net/.

What Happens If the Transformation Is Complex?

If the transformation we need to perform is really complex, then using XSLT is the best option as it can handle complex transformations without any problem. XSLT is rich with features such as the document() function, sort capabilities, and features related to writing templates. We can use <xsl:include> to modularize our stylesheets, if they get very large. Thus, it is beneficial to use XSLT when we are dealing with large and complex transformations.

What Happens If the XML Document Is Large?

The Sablotron XSLT processor used by PHP parses the whole XML document into memory, as well as the XSLT stylesheet, for processing. This means that the larger the document to transform, the larger will be the memory consumed by the application, and the time that is needed to transform the document.

If the XSLT stylesheet is well-written (XSLT does some things faster than others, for example, expressions tend to be slow) we'll be able to handle transformation of middle- to large-sized documents without much trouble, and we could even do it online.

However, if the documents get really big and the time consumed by the transformation is not acceptable, we can try some of these workarounds:

Use a caching mechanism for XSLT transformations
Use batch transformations
Use SAX

Using a Caching Mechanism for XSLT Transformations

Many times we'll be transforming the same XML document with the same XSLT stylesheet. This is very common, for example, if we use XSLT to create an HTML representation of an XML document for web presentation. It is clear that after the first transformation is done all the others are a waste of resources. Caching implies making the result of XSLT transformations persistent and outputting that result if the same transformation is requested. A caching mechanism can be used in:

The XSLT wrapper class used for the XSLT processor
A transformation class used to abstract XML transformations
The application level

It is advisable that we implement caching at the XSLT processor abstraction level or at the transformation abstraction level, because at the application level our code will be harder to maintain.

A caching mechanism must remember the XML source and the XSLT stylesheet used (storing MD5 checksums is generally a good way to do it), and store the result of the transformation along with the checksums. It also has to decide what policy will be used to prevent the cache from growing ad infinitum--keeping a maximum cache size and cleaning, for example, the 'n' least used transformations, when the cache is full.

With this in mind it will be easy for us to add caching to our transformation classes using, for example, a MySQL database to store the results. Since MySQL is fast, it is a good option for caching. Once caching is enabled we can check if the resources and time used for our transformations are down to acceptable levels. Even if we don't have a performance problem, caching is a good idea as it consumes fewer resources on the server; we don't need them but other processes may.

Using Batch Transformations

If caching is not enough, we must think about not transforming the documents online. We can write a batch transformation system to queue up the transformations and produce the outputs as they are processed.

If we are in an online system this may be difficult, but if we are at the backend of a system we will find that we can transform documents in the background while the application runs in the foreground. This is a very good alternative in the following situations, for example:

Transforming XML documents into PDF files that may be downloaded later by the application
Transforming XML documents to SQL data that the application may use later
Transforming XML documents into reports that will be stored or sent later

However, if we need to use the result of the transformation immediately in the application then this approach is not very useful. Web publishing systems are a clear example of systems that can't benefit from batch transformations.

Batch transformations can be done in many different ways. What we need is a client-server model with a transformation server. We store the document and the XSLT stylesheet at some other location and the server does the transformation when available (when not doing other transformations), storing the result at a desired location. Another batch system can then pick up results and put them in the proper place for the application. A database, files, or TCP/IP can be used for communications between the transformation server, and the client application.

The following table shows some of the advantages and disadvantages of using XSLT:

Advantages	Disadvantages
Solid standard by the W3C	May scale badly if the XML documents are really huge
Eases maintainability and reuse of transformations
Fast enough for online processing of small- to mid-sized documents
Can be used to perform complex transformations

Using SAX

If both the above options are not enough, then we can draw the conclusion that an XSLT transformation is not feasible in our application. Though this may happen, it is not a very common scenario. We must be making performing transformations of huge and mutable (non-cacheable) documents for this to happen. If it does happen it is best to use SAX, instead of XSLT, for our transformations. In the next section we will see how to use SAX.

[previous] [next]

Created: August 12, 2002
Revised: August 12, 2002

URL: https://webreference.com/programming/php/php4xml/chap10/1/3.html

WebReference.com - Part 1 of Chapter 10 from Professional PHP4 XML, from Wrox Press Ltd (3/7)

Professional PHP4 XML

What Happens If the Transformation Is Complex?

What Happens If the XML Document Is Large?

Find a programming school near you