WebReference.com - Excerpt from Inside XSLT, Chapter 2, Part 1 (3/4)

Inside XSLT

Whitespace

The example XML document we've been working on so far is nicely indented to show the hierarchical structure of its elements, like this:

<?xml version="1.0"?>
<library>
   <book>
       <title>
           Earthquakes for Lunch
       </title>
       <title>
           Volcanoes for Dinner
       </title>
   </book>
</library>

However, from an XSLT point of view, the whitespace I've used to indent elements in this example actually represents text nodes. This means that by default, those spaces will be copied to the output document. Understanding how this works is a major source of confusion in XSLT, so I'll take a quick look at it here, and take a look at how to handle whitespace in detail in the next chapter.

In XSLT, there are four whitespace characters: spaces, carriage returns, line feeds, and tabs. These characters are all treated as whitespace. That means that from an XSLT processor's point of view, the input document looks like this:

<?xml version="1.0"?>
<library>
....<book>
........<title>
............Earthquakes for Lunch
........</title>
........<title>
............Volcanoes for Dinner
........</title>
....</book>
</library>

All the whitespace between the elements is treated as whitespace text nodes in XSLT. That means that there are five whitespace text nodes we have to add to our diagram: one before the <book> element, one after the <book> element, as well as one before, after, and in between the <title> elements:

                                        root
                                          |
                                  element: <library>
                                          |
                             |------------|---------------------|
                   text: whitespace   element: <book>   text: whitespace
                                          |
     |------------------|-----------------|---------------------|------------------|
text: whitespace   element: <title>  text: whitespace    element: <title>    text: whitespace
                        |                                       |
           text: "Earthquakes for Lunch"             text: "Volcanoes for Dinner"

Whitespace nodes such as these are text nodes that contain nothing but whitespace. Because XSLT processors preserve this whitespace by default, you should not be surprised when it shows up in result documents. This extra whitespace is usually not a problem in HTML, XML, and XHTML documents, and I'll eliminate it in the result documents here in the text to make sure the indenting indicates the correct document structure. We'll see how XSLT processors can strip whitespace nodes from documents, as well as how XSLT processors can indent result documents. Note that text nodes that contain characters other than whitespace are not considered whitespace nodes, and so will never be stripped from a document.

Another thing to note is that attributes are themselves treated as nodes. Although attribute nodes are not considered child nodes of the elements in which they appear, the element is considered their parent node. (This is different from the XML DOM model, in which attributes both are not children and do not have parents.) If I add an attribute to an element like this:

<?xml version="1.0"?>
<library>
   <book>
       <title>
           Earthquakes for Lunch
       </title>
       <title pub_date="2001">
           Volcanoes for Dinner
       </title>
   </book>
</library>

Then here's how this attribute appears in the document tree:

                                        root
                                          |
                                  element: <library>
                                          |
                             |------------|---------------------|
                   text: whitespace   element: <book>   text: whitespace
                                          |
     |------------------|-----------------|---------------------|------------------|
text: whitespace   element: <title>  text: whitespace    element: <title>    text: whitespace
                        |                                       |
           text: "Earthquakes for Lunch"      |---------------------------|
                                              |                           |
                                   text: Volcanoes for Dinner    attribute: pub_date="2001"

[previous] [next]

Created: September 12, 2001
Revised: September 12, 2001

URL: https://webreference.com/authoring/languages/xml/insidexslt/chap2/1/3.html

WebReference.com - Excerpt from Inside XSLT, Chapter 2, Part 1 (3/4)

Inside XSLT

Whitespace

Find a programming school near you