Author:
James Clark (Thai Open Source Software Center) <jjc@thaiopensource.com>
Date:
2001-02-13
Copyright © 2001 Thai Open Source Software Center Ltd
This document is the definitive specification of the TREX language.
The primary concept in TREX is the pattern. An unordered collection of attributes and an ordered sequence of elements and characters are matched jointly against a pattern with respect to an environment. An environment is a mapping from names to patterns together with a possibly null reference to a parent environment. The result of matching is true or false.
An XML document is valid with respect to a TREX pattern if an empty collection of attributes and a sequence containing just the document element of the document matches the pattern with respect to an empty environment.
TREX also has the concept of a name-class, which is a set of expanded-names.
An element is a triple <name, attributes, children>, where name is an expanded-name, attributes is a unordered collection of zero or more attributes and children is an ordered sequence of zero or more elements or characters. An attribute is a pair <name, value>, where name is an expanded-name and value is a sequence of zero or more characters. An expanded-name is a pair <namespace URI, local name>, where namespace URI is a string containing a URI reference and local name is a string matching the NCName production of the XML Recommendation.
We first of all define a primitive syntax, which allows only a subset of what the full syntax allows, but which can express everything that the full syntax can.
The primitive syntax is as follows:
| pattern | ::= | <t:element> name-class pattern </t:element> |
| <t:attribute>
name-class
pattern
</t:attribute> |
||
| <t:anyString/> |
||
| <t:string white-space>
string
</t:string>
|
||
| <t:data type="NCName" ns="namespaceURI"/>
|
||
| <QName t:role="datatype" attributes>
anything
</QName>
|
||
| <t:empty/> |
||
| <t:notAllowed/> |
||
| <t:oneOrMore>
pattern
</t:oneOrMore> |
||
| <t:group>
pattern
pattern
</t:group> |
||
| <t:choice>
pattern
pattern
</t:choice> |
||
| <t:interleave>
pattern
pattern
</t:interleave> |
||
| <t:concur>
pattern
pattern
</t:concur> |
||
| <t:ref name="NCName" parent/> |
||
| <t:grammar>
<t:start> pattern
</t:start>
definition* </t:grammar> |
||
| definition | ::= | <t:define name="NCName"> pattern
</t:define> |
| name-class | ::= | <t:name ns="namespaceURI">
NCName
</t:name>
|
| <t:anyName/> |
||
| <t:nsName ns="namespaceURI"/> |
||
| <t:choice>
name-class
name-class
</t:choice> |
||
| <t:difference>
name-class
name-class
</t:difference> |
||
| white-space | ::= | whiteSpace="preserve" |
| whiteSpace="normalize" |
||
| parent | ::= | parent="true" |
| parent="false" |
||
The above syntax assumes a namespace declations of
xmlns:t="http://www.thaiopensource.com/trex" .
We now define the semantics of the primitive syntax.
We use the following:
First, we define M.
M [[<t:element> name-class
pattern </t:element>]]
(a, c, e) if and only if
M [[<t:attribute> name-class
pattern </t:attribute>]]
(a, c, e) if and only if
M [[<t:anyString/>]]
(a, c, e) if and only if
M [[<t:string
whiteSpace="preserve"> string
</t:string>]](a, c,
e) if and only if
M [[<t:string whiteSpace="normalize"> string
</t:string>]](a, c,
e) if and only if
M [[<t:data type="NCName" ns="namespaceURI"/>]](a, c,
e) if and only if
M [[<QName t:role="datatype" attributes>
anything
</QName>]](a, c,
e) if and only if
<QName attributes>
anything
</QName>]](s) where s the string formed by concatenating
the members of cM [[<t:empty/>]]
(a, c, e) if and only if
M [[<t:notAllowed/>]]
(a, c, e) is always false
M [[<t:choice>
pattern1
pattern2
</t:choice>]]
(a, c, e) if and only if
M [[<t:concur>
pattern1
pattern2
</t:concur>]]
(a, c, e) if and only if
M [[<t:oneOrMore>
pattern
</t:oneOrMore>]]
(a, c, e) if and only if
<t:group>
pattern
<t:choice>
<t:empty/>
<t:oneOrMore>
pattern
</t:oneOrMore>
</t:choice>
</t:group>]]
M [[<t:group>
pattern1
pattern2
</t:group>]](a, c,
e) if and only if there exist a1,
a2, c1,
c2 such that
M [[<t:interleave>
pattern1
pattern2
</t:interleave>]](a, c,
e) if and only if there exist a1,
a2, c1,
c2 such that
M [[<t:ref parent="false"
name="name"/>]](a,
c, e) if and only if
M [[<t:ref parent="true"
name="name"/>]](a,
c, e) if and only if
M [[<t:grammar>
<t:start> pattern
</t:start>
definition* </t:grammar>]](a,
c, e) if and only if
Next, we define C.
C [[<t:name
ns="namespaceURI"> NCName
</t:name>]](n) if and only if
C [[<t:anyName/>]](n) is
always true
C [[<t:nsName ns="namespaceURI"/>]](n) if and only if
C [[<t:choice>
name-class1
name-class2
</t:choice>]](n) if and only if
C [[<t:difference>
name-class1
name-class2
</t:difference>]](n) if and only if
Recursion is restricted. Informally, recursive references are
permitted only within an element. In other words, within
any definition there must be an element enclosing any
reference to that definition.
We write prohibit(e, NCName,
n) to denote an environment that is the same as e
except that NCName is bound to the pattern
<t:prohibit>n</t:prohibit>.
We write R [[p]](e,d) to mean that pattern p has no illegal recursion with respect to environment e at element depth d. We can then define R as follows:
<t:element> name-class
pattern </t:element>]](e,
d) if and only if R
[[pattern]](e, d + 1)
<t:attribute> name-class
pattern </t:attribute>]](e,
d) if and only if R
[[pattern]](e, d)
<t:anyString/>]](e, d) is always true
<t:string white-space> string
</t:string>]](e, d) is always true
<t:empty/>]](e,
d) is always true
<t:notAllowed/>]](e,
d) is always true
<t:choice>
pattern1 pattern2
</t:choice>]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:concur>
pattern1 pattern2
</t:concur>]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:oneOrMore>
pattern
</t:oneOrMore>]](e, d) if and only if R [[pattern]](e, d)
<t:group>
pattern1 pattern2
</t:group>]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:interleave>
pattern1
pattern2
</t:interleave>]](e, d) if and only if R [[pattern1]](e, d) and
R [[pattern2]](e, d)
<t:ref parent="false"
name="NCName"/>]](e,
d) if and only if R [[lookup(e,
NCName)]](prohibit(e, NCName,
d), d)
<t:ref parent="true"
name="NCName"/>]](e,
d) if and only if R
[[lookup(parent(e),
NCName)]](prohibit(parent(e),
NCName, d), d)
<t:grammar> <t:start>
pattern </t:start> definition*
</t:grammar>]](e, d) if and
only if R [[pattern]](E
[[definition*]](e), d)
<t:prohibit> n
</t:prohibit>]](e, d)
if and only if n is not equal to dThe restriction on recursion is simply that for the outermost pattern p it must be the case that R [[p]]({}, 0).
Next, we define the full syntax.
| pattern | ::= | <t:element> name-class pattern+ </t:element> |
| <t:element name="QName">
pattern+ </t:element> |
||
| <t:attribute>
name-class
pattern?
</t:attribute> |
||
| <t:attribute name="QName"
global?>
pattern?
</t:attribute> |
||
| <t:anyString/> |
||
| <t:string white-space?>
string
</t:string>
|
||
| <t:data type="NCName"/>
|
||
| <QName t:role="datatype" attributes>
anything
</QName>
|
||
| <t:empty/> |
||
| <t:notAllowed/> |
||
| <t:oneOrMore>
pattern+
</t:oneOrMore> |
||
| <t:zeroOrMore>
pattern+
</t:zeroOrMore> |
||
| <t:optional>
pattern+
</t:optional> |
||
| <t:mixed>
pattern+
</t:mixed> |
||
| <t:group>
pattern+
</t:group> |
||
| <t:choice>
pattern+
</t:choice> |
||
| <t:interleave>
pattern+
</t:interleave> |
||
| <t:concur>
pattern+
</t:concur> |
||
| <t:ref name="NCName" parent?/> |
||
| <t:grammar>
grammar-element* </t:grammar> |
||
| <t:include href="URIReference"/> |
||
| grammar-element | ::= | <t:define name="NCName" combine?> pattern+
</t:define> |
| <t:start combine?> pattern+
</t:start> |
||
| <t:start name="NCName" combine?> pattern+
</t:start> |
||
| <t:include href="URIReference"/> |
||
| name-class | ::= | <t:name>
QName
</t:name>
|
| <t:anyName/> |
||
| <t:nsName/> |
||
| <t:choice>
name-class+
</t:choice> |
||
| <t:difference>
name-class+
</t:difference> |
||
| <t:not>
name-class
</t:not> |
||
| global | ::= | global="true" |
| global="false" |
||
| white-space | ::= | whiteSpace="preserve" |
| whiteSpace="normalize" |
||
| parent | ::= | parent="true" |
| parent="false" |
||
| combine | ::= | combine="replace" |
| combine="choice" |
||
| combine="group" |
||
| combine="interleave" |
||
| combine="concur" |
||
In addition, any element may have an ns attribute.
It is an error if a grammar element contains multiple
grammar-elements with the same name.
Now, we define the semantics of the full syntax by showing how to transform the full syntax into the primitive syntax.
The first stage of the transformation expands include
elements.
When an include element appears as a
pattern, it is replaced by the element referenced by the
URI reference contained in the href attribute. The
referenced element must specify a pattern. If an ns
attribute is present on the include element but no
ns attribute is present on the referenced element, then,
before the include element is replaced, the
ns attribute is added to the referenced element.
When an include element occurs as a child of a
grammar element, then the element referenced by the URI
reference contained in the href attribute must also be a
grammar element, and the include element is
replaced by the children of the referenced grammar
element. If an ns attribute is present on the
include element or on the referenced grammar
element, then, before the include element is replaced, an
ns attribute is added to each child element of the
referenced grammar element that does not have an ns
attribute; the value of the added ns attribute is equal
to the value of the ns attribute on the referenced
grammar element if it has one, and otherwise is equal to
the value of the ns attribute on the include
element.
It is an error if, before the replacement of any
include elements, a grammar element has two
or more child elements with the same name or has two or more
start child elements, even if those child elements have
combine attributes.
If the replacement of an include element also contains
include elements, then those elements are replaced
recursively. It is an error a pattern included by a some URI directly
or indirectly includes a pattern with that same URI; relative URIs
must be resolved to absolute URIs before testing whether they are the
same URI.
Namespace expansion makes namespaces explicit on all elements that specify names.
First, an ns attribute is added to any of the
following categories of element if they do not already have an
ns attribute:
name element
nsName element
element element with a name
attribute
data element
attribute element with a name
attribute and with a global attribute with a value of
trueThe value of the added ns attribute is equal to the
value of the ns attribute of the nearest ancestor that
has an ns attribute, or the empty string if there is no
such ancestor. Note that ancestor means ancestor in the tree after
inclusions have been resolved. Thus, ns attributes
inherit into included patterns.
Next, for
element or attribute
element where the value of the name attribute contains a
prefix, and
data element where the value of the
type attribute contains a prefix, and
name element where the content
includes a prefix,
the prefix together with the colon is removed and an
ns attribute is added, replacing any existing
ns attribute. The value of the added ns
attribute is equal to the namespace URI declared by the in-scope
namespace declarations for that prefix. It is an error if there is no
declaration for that prefix. Note that the in-scope namespace
declarations of an element are not changed when it is included. Thus
namespace declarations are not inherited into included
patterns.
Next, an ns="" attribute is added to any
attribute element with a name attribute that
does not yet have an ns attribute.
Finally, any ns attribute on an element that is not
one of
name element
nsName element
element element with a name
attribute
attribute element with a name
attribute
data element
is removed. Also any global attribute on an
attribute element is removed.
The following rules specify how to transform a pattern using the full syntax into a pattern using the primitive syntax.
<t:mixed> pattern
<t:mixed> => <t:interleave>
<t:anyString/> pattern
</t:interleave>
<t:optional> pattern
</t:optional> => <t:choice>
<t:empty/> pattern
</t:choice>
<t:zeroOrMore> pattern
</t:zeroOrMore> => <t:optional>
<t:oneOrMore> pattern
</t:oneOrMore> </t:optional>
<t:group> pattern
</t:group> => pattern
<t:interleave> pattern
</t:interleave> => pattern
<t:choice> pattern
</t:choice> => pattern
<t:concur> pattern
</t:concur> => pattern
<t:element> name-class
pattern1
pattern2+
</t:element> => <t:element>
name-class <t:group>
pattern1
pattern2+ </t:group>
</t:element>
<t:oneOrMore>
pattern1
pattern2+
</t:oneOrMore> => <t:oneOrMore>
<t:group> pattern1
pattern2+ </t:group>
</t:oneOrMore>
<t:optional>
pattern1
pattern2+
</t:optional> => <t:optional>
<t:group> pattern1
pattern2+ </t:group>
</t:optional>
<t:zeroOrMore>
pattern1
pattern2+
</t:zeroOrMore> => <t:zeroOrMore>
<t:group> pattern1
pattern2+ </t:group>
</t:zeroOrMore>
<t:mixed> pattern1
pattern2+ </t:mixed> =>
<t:mixed> <t:group>
pattern1
pattern2+ </t:group>
</t:mixed>
<t:not> pattern1
pattern2+ </t:not> =>
<t:not> <t:group>
pattern1
pattern2+ </t:group>
</t:not>
<t:start> pattern1
pattern2+ </t:start> =>
<t:start> <t:group>
pattern1
pattern2+ </t:group>
</t:start>
<t:define>
pattern1
pattern2+ </t:define> =>
<t:define> <t:group>
pattern1
pattern2+ </t:group>
</t:define>
<t:attribute> name-class
<t:/attribute> => <t:attribute>
name-class <t:anyString/>
<t:/attribute>
<t:start name="name">
pattern+ <t:/start> =>
<t:start> pattern+
<t:/start> <t:define name="name">
pattern+ <t:/define>
definitions1 <t:start>
pattern <t:/start>
definitions2 => <t:start>
pattern <t:/start>
definitions1
definitions2
<t:element name="NCName"
ns="ns">
pattern+ <t:/element> =>
<t:element> <t:name
ns="ns">NCName<t:/name>
pattern+ <t:/element>
<t:attribute name="NCName"
ns="ns"> pattern?
<t:/attribute> => <t:attribute>
<t:name
ns="ns">NCName<t:/name>
pattern? <t:/attribute>
<t:group> pattern1
pattern2
pattern3+ </t:group>
=> <t:group> <t:group>
pattern1
pattern2 </t:group>
pattern3+
</t:group>
<t:choice>
pattern1
pattern2
pattern3+ </t:choice>
=> <t:choice> <t:choice>
pattern1
pattern2 </t:choice>
pattern3+
</t:choice>
<t:interleave>
pattern1
pattern2
pattern3+
</t:interleave> => <t:interleave>
<t:interleave>
pattern1
pattern2
</t:interleave>
pattern3+
</t:interleave>
<t:concur>
pattern1
pattern2
pattern3+ </t:concur>
=> <t:concur> <t:concur>
pattern1
pattern2 </t:concur>
pattern3+
</t:concur>
<t:string> string
</t:string> => <t:string
whiteSpace="normalize"> string
</t:string>
<t:ref
name="NCName"/> => <t:ref
parent="false" name="NCName"/>
<t:choice>
name-class1
name-class2
name-class3+ </t:choice>
=> <t:choice> <t:choice>
name-class1
name-class2 </t:choice>
name-class3+
</t:choice>
<t:difference>
name-class1
name-class2
name-class3+ </t:difference>
=> <t:difference> <t:difference>
name-class1
name-class2 </t:difference>
name-class3+
</t:difference>
<t:choice> name-class
</t:choice> => name-class
<t:difference> name-class
</t:difference> => name-class
<t:not> name-class
</t:not> => <t:difference>
<t:anyName/> name-class
</t:difference>
Finally, conflicting define and start
elements are combined. The combine attribute controls
how the elements are combined. A define element conflicts
with a preceding define element if it has the same name.
A start element conflicts with any preceding
start element. Each define and
start child element is examined in turn. If the current
element conflicts with a preceding element, then if the current
element does not have a combine attribute, it is an
error; otherwise, it is combined with the preceding element with which
it conflicts according to the value of the combine
attribute. If the value of the combine attribute is
replace, then the current element replaces the
conflicting element. Otherwise, the value of the combine
attribute is the name of the element to use to combine the pattern
contained in the current element with the pattern contained in the
conflicting element. In other words,
<t:define name="x"> p1</t:define>
and
<t:define name="x" combine="e"> p2</t:define>
will be combined into
<t:define name="x"> <t:e> p1 p2 </t:e> </t:define>
If there is a combine attribute, but there is no
conflicting preceding definition, then the combine
attribute is ignored.
<grammar ns="http://www.thaiopensource.com/trex"
xmlns="http://www.thaiopensource.com/trex">
<start name="pattern">
<choice>
<element name="element">
<choice>
<attribute name="name"/>
<ref name="open-name-class"/>
</choice>
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="attribute">
<ref name="common-atts"/>
<choice>
<group>
<attribute name="name"/>
<optional>
<attribute name="global">
<choice>
<string>true</string>
<string>false</string>
</choice>
</attribute>
</optional>
</group>
<ref name="open-name-class"/>
</choice>
<interleave>
<ref name="other"/>
<optional>
<ref name="pattern"/>
</optional>
</interleave>
</element>
<element name="group">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="interleave">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="choice">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="concur">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="optional">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="zeroOrMore">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="oneOrMore">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="mixed">
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="ref">
<attribute name="name"/>
<optional>
<attribute name="parent">
<choice>
<string>true</string>
<string>true</string>
</choice>
</attribute>
</optional>
<ref name="common-atts"/>
</element>
<element name="empty">
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="anyString">
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="string">
<optional>
<attribute name="whiteSpace">
<choice>
<string>preserve</string>
<string>normalize</string>
</choice>
</attribute>
</optional>
<ref name="common-atts"/>
<mixed>
<ref name="other"/>
</mixed>
</element>
<element name="data">
<attribute name="type"/>
<ref name="common-atts"/>
<ref name="other"/>
</element>
<ref name="anonymous-datatype"/>
<element name="notAllowed">
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="include">
<attribute name="href"/>
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="grammar">
<ref name="common-atts"/>
<interleave>
<ref name="other"/>
<oneOrMore>
<ref name="grammar-element"/>
</oneOrMore>
</interleave>
</element>
</choice>
</start>
<define name="grammar-element">
<choice>
<element name="define">
<attribute name="name"/>
<ref name="combine-att"/>
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="start">
<optional>
<attribute name="name"/>
</optional>
<ref name="combine-att"/>
<ref name="common-atts"/>
<ref name="open-patterns"/>
</element>
<element name="include">
<attribute name="href"/>
</element>
</choice>
</define>
<define name="anonymous-datatype">
<element>
<not>
<nsName/>
</not>
<attribute name="role" global="true">
<string>datatype</string>
</attribute>
<zeroOrMore>
<choice>
<attribute>
<not>
<nsName/>
</not>
</attribute>
<ref name="any"/>
<anyString/>
</choice>
</zeroOrMore>
</element>
</define>
<define name="combine-att">
<optional>
<attribute name="combine">
<choice>
<string>replace</string>
<string>choice</string>
<string>group</string>
<string>interleave</string>
<string>concur</string>
</choice>
</attribute>
</optional>
</define>
<define name="open-patterns">
<interleave>
<ref name="other"/>
<oneOrMore>
<ref name="pattern"/>
</oneOrMore>
</interleave>
</define>
<define name="name-class">
<choice>
<element name="name">
<ref name="common-atts"/>
<mixed>
<ref name="other"/>
</mixed>
</element>
<element name="anyName">
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="nsName">
<ref name="common-atts"/>
<ref name="other"/>
</element>
<element name="not">
<ref name="common-atts"/>
<ref name="open-name-class"/>
</element>
<element name="difference">
<ref name="common-atts"/>
<oneOrMore>
<ref name="open-name-class"/>
</oneOrMore>
</element>
<element name="choice">
<ref name="common-atts"/>
<oneOrMore>
<ref name="open-name-class"/>
</oneOrMore>
</element>
</choice>
</define>
<define name="open-name-class">
<interleave>
<ref name="other"/>
<ref name="name-class"/>
</interleave>
</define>
<define name="common-atts">
<optional>
<attribute name="ns"/>
</optional>
<zeroOrMore>
<attribute>
<not>
<choice>
<nsName/>
<nsName ns=""/>
</choice>
</not>
</attribute>
</zeroOrMore>
</define>
<define name="other">
<zeroOrMore>
<element>
<not>
<nsName/>
</not>
<zeroOrMore>
<choice>
<attribute>
<not>
<nsName/>
</not>
</attribute>
<anyString/>
<ref name="any"/>
</choice>
</zeroOrMore>
</element>
</zeroOrMore>
</define>
<define name="any">
<element>
<anyName/>
<zeroOrMore>
<choice>
<attribute>
<anyName/>
</attribute>
<anyString/>
<ref name="any"/>
</choice>
</zeroOrMore>
</element>
</define>
</grammar>