The XML Schema Companion
The XML Schema Companion. Chapter 15: Patterns
Reproduced from Neil Bradley's The XML Schema Companion by permission of Addison-Wesley. ISBN 0321136179, copyright 2004. All rights reserved. See https://www.awprofessional.com/titles/0321136179 for more information.
The Âpattern' facet requires more explanation than the brief description given in Section 14.6 provides. This XML feature is based on the regular expression capaÂbilities of the Perl programming language. It is therefore very powerful, but this strength comes at the cost of some complexity.
15.1 Introduction
Although the XML Schema language has a large number of built-in data types that can be used, restricted, and extended, some requirements demand much finer conÂtrol over the exact structure of a value. For example, a simple code might need to consist of three lowercase letters:
<Code>abc</Code> |
<!-- OK --> |
<Code>ABC</Code> |
<!-- ERROR --> |
<Code>abcd</Code> |
<!-- ERROR --> |
Similarly, when an element or attribute contains an ISBN (International Standard Book Number), it should be possible to apply constraints that reflect the nature of ISBN codes. All ISBN codes are composed of three identifiers (location, pubÂlisher, and book) and a check digit, separated by hyphens (or spaces). Valid values would include Â0-201-41999-8' and Â963-9131-21-0'. The schema processor should detect any error in an ISBN attribute:
<Book ISBN="0-201-77059-8" ...> |
<!-- OK --> |
<Book ISBN="X-999999-" ...> |
<!-- ERRORS --> |
Some programming languages, such as Perl, include a regular expression lanÂguage, which defines a pattern against which a series of characters can be comÂpared. Typically, this feature is used to search for fragments of a text document, but the XML Schema language has co-opted it for sophisticated validation of eleÂment content and attribute values.
15.2 Simple Templates
The pattern facet element holds a pattern in its value attribute. The simplest posÂsible form of pattern involves a series of characters that must be present, in the order specified, in each element or attribute declaration that uses the data type conÂstrained by the pattern facet.
The pattern Âabc' might be specified as the fixed value of a Code element:
<Code>abc</Code>
The pattern Â0-201-41999-8' might be specified as the fixed value of an ISBN attribute:
<Book ISBN="0-201-41999-8" ... >
In this simple form, a pattern is similar to an enumeration, except that in the case of patterns the match must be exact, regardless of the data type used (recall that Section 14.6 explains how patterns differ from enumerations in this respect).
Although specifying an exact sequence of characters is among the simplest things that can be achieved with the pattern language, specifying a sequence of characters that must not appear in a value is much harder.
It is often a good idea to use the Ânormalized' or Âtoken' data type as the base data type for the restriction when the presence of surrounding whitespace should not be allowed to trigger an error.
Just as a restriction element can contain multiple enumeration elements, it can also contain multiple pattern elements. The element content or attribute value is valid if it matches any of the patterns:
<restriction
base="token">
<pattern
value="abc" />
<pattern value="xyz" />
</restriction>
<Code>abc</Code> |
<!-- OK --><!-- OK --> |
|
<Code> abc </Code> <Code>acb</Code> |
<!-- OK --> <!-- ERROR |
--> |
<Code>xzy</Code> |
<!-- ERROR |
--> |
<Code>abcc</Code> |
<!-- ERROR |
--> |
Alternatively, a single pattern can contain multiple Âbranches'. Each branch is actually a distinct, alternative expression, separated by the Â|' symbol from previÂous or following branches. Again, the pattern test succeeds if any one of the branches matches the pattern (the Â|' symbol is therefore performing a function similar to its use in DTD content models). The following example is equivalent to the multipattern example above:
<restriction
base="string">
<pattern value="abc|xyz" />
</restriction>
Note that, although branches are never essential at this level, because multiple pattern elements can be used instead, they are the only technique available in another circumstance discussed later (involving subexpressions).
Created: March 27, 2003
Revised: January 1, 2004
URL: https://webreference.com/programming/awxml1