Author:
James Clark (Thai Open Source Software Center) <jjc@thaiopensource.com>
Date:
2001-02-13
Copyright © 2001 Thai Open Source Software Center Ltd
This document is the definitive specification of the TREX language.
The primary concept in TREX is the pattern. An unordered collection of attributes and an ordered sequence of elements and characters are matched jointly against a pattern with respect to an environment. An environment is a mapping from names to patterns together with a possibly null reference to a parent environment. The result of matching is true or false.
An XML document is valid with respect to a TREX pattern if an empty collection of attributes and a sequence containing just the document element of the document matches the pattern with respect to an empty environment.
TREX also has the concept of a name-class, which is a set of expanded-names.
An element is a triple <name, attributes, children>, where name is an expanded-name, attributes is a unordered collection of zero or more attributes and children is an ordered sequence of zero or more elements or characters. An attribute is a pair <name, value>, where name is an expanded-name and value is a sequence of zero or more characters. An expanded-name is a pair <namespace URI, local name>, where namespace URI is a string containing a URI reference and local name is a string matching the NCName production of the XML Recommendation.
We first of all define a primitive syntax, which allows only a subset of what the full syntax allows, but which can express everything that the full syntax can.
The primitive syntax is as follows:
pattern | ::= | <t:element> name-class pattern </t:element> |
| <t:attribute>
name-class
pattern
</t:attribute> |
||
| <t:anyString/> |
||
| <t:string white-space>
string
</t:string>
|
||
| <t:data type=" NCName" ns=" namespaceURI"/>
|
||
| < QName t:role="datatype" attributes>
anything
</ QName>
|
||
| <t:empty/> |
||
| <t:notAllowed/> |
||
| <t:oneOrMore>
pattern
</t:oneOrMore> |
||
| <t:group>
pattern
pattern
</t:group> |
||
| <t:choice>
pattern
pattern
</t:choice> |
||
| <t:interleave>
pattern
pattern
</t:interleave> |
||
| <t:concur>
pattern
pattern
</t:concur> |
||
| <t:ref name=" NCName" parent/> |
||
| <t:grammar>
<t:start> pattern
</t:start>
definition* </t:grammar> |
||
definition | ::= | <t:define name=" NCName"> pattern
</t:define> |
name-class | ::= | <t:name ns=" namespaceURI">
NCName
</t:name>
|
| <t:anyName/> |
||
| <t:nsName ns=" namespaceURI"/> |
||
| <t:choice>
name-class
name-class
</t:choice> |
||
| <t:difference>
name-class
name-class
</t:difference> |
||
white-space | ::= | whiteSpace="preserve" |
| whiteSpace="normalize" |
||
parent | ::= | parent="true" |
| parent="false" |
The above syntax assumes a namespace declations of
xmlns:t="http://www.thaiopensource.com/trex"
.
We now define the semantics of the primitive syntax.
We use the following:
First, we define M.
M [[<t:element>
name-class
pattern </t:element>
]]
(a, c, e) if and only if
M [[<t:attribute>
name-class
pattern </t:attribute>
]]
(a, c, e) if and only if
M [[<t:anyString/>
]]
(a, c, e) if and only if
M [[<t:string
whiteSpace="preserve">
string
</t:string>
]](a, c,
e) if and only if
M [[<t:string whiteSpace="normalize">
string
</t:string>
]](a, c,
e) if and only if
M [[<t:data type="
NCName" ns="
namespaceURI"/>
]](a, c,
e) if and only if
M [[<
QName t:role="datatype"
attributes>
anything
</
QName>
]](a, c,
e) if and only if
<
QName
attributes>
anything
</
QName>
]](s) where s the string formed by concatenating
the members of cM [[<t:empty/>
]]
(a, c, e) if and only if
M [[<t:notAllowed/>
]]
(a, c, e) is always false
M [[<t:choice>
pattern1
pattern2
</t:choice>
]]
(a, c, e) if and only if
M [[<t:concur>
pattern1
pattern2
</t:concur>
]]
(a, c, e) if and only if
M [[<t:oneOrMore>
pattern
</t:oneOrMore>
]]
(a, c, e) if and only if
<t:group>
pattern
<t:choice>
<t:empty/>
<t:oneOrMore>
pattern
</t:oneOrMore>
</t:choice>
</t:group>
]]
M [[<t:group>
pattern1
pattern2
</t:group>
]](a, c,
e) if and only if there exist a1,
a2, c1,
c2 such that
M [[<t:interleave>
pattern1
pattern2
</t:interleave>
]](a, c,
e) if and only if there exist a1,
a2, c1,
c2 such that
M [[<t:ref parent="false"
name="
name"/>
]](a,
c, e) if and only if
M [[<t:ref parent="true"
name="
name"/>
]](a,
c, e) if and only if
M [[<t:grammar>
<t:start>
pattern
</t:start>
definition* </t:grammar>
]](a,
c, e) if and only if
Next, we define C.
C [[<t:name
ns="
namespaceURI">
NCName
</t:name>
]](n) if and only if
C [[<t:anyName/>
]](n) is
always true
C [[<t:nsName ns="
namespaceURI"/>
]](n) if and only if
C [[<t:choice>
name-class1
name-class2
</t:choice>
]](n) if and only if
C [[<t:difference>
name-class1
name-class2
</t:difference>
]](n) if and only if
Recursion is restricted. Informally, recursive references are
permitted only within an element
. In other words, within
any definition there must be an element
enclosing any
reference to that definition.
We write prohibit(e, NCName,
n) to denote an environment that is the same as e
except that NCName is bound to the pattern
<t:prohibit>
n</t:prohibit>
.
We write R [[p]](e,d) to mean that pattern p has no illegal recursion with respect to environment e at element depth d. We can then define R as follows:
<t:element>
name-class
pattern </t:element>
]](e,
d) if and only if R
[[pattern]](e, d + 1)
<t:attribute>
name-class
pattern </t:attribute>
]](e,
d) if and only if R
[[pattern]](e, d)
<t:anyString/>
]](e, d) is always true
<t:string
white-space>
string
</t:string>
]](e, d) is always true
<t:empty/>
]](e,
d) is always true
<t:notAllowed/>
]](e,
d) is always true
<t:choice>
pattern1 pattern2
</t:choice>
]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:concur>
pattern1 pattern2
</t:concur>
]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:oneOrMore>
pattern
</t:oneOrMore>
]](e, d) if and only if R [[pattern]](e, d)
<t:group>
pattern1 pattern2
</t:group>
]](e, d) if and only
if R [[pattern1]](e,
d) and R
[[pattern2]](e, d)
<t:interleave>
pattern1
pattern2
</t:interleave>
]](e, d) if and only if R [[pattern1]](e, d) and
R [[pattern2]](e, d)
<t:ref parent="false"
name="
NCName"/>
]](e,
d) if and only if R [[lookup(e,
NCName)]](prohibit(e, NCName,
d), d)
<t:ref parent="true"
name="
NCName"/>
]](e,
d) if and only if R
[[lookup(parent(e),
NCName)]](prohibit(parent(e),
NCName, d), d)
<t:grammar>
<t:start>
pattern </t:start>
definition*
</t:grammar>
]](e, d) if and
only if R [[pattern]](E
[[definition*]](e), d)
<t:prohibit>
n
</t:prohibit>
]](e, d)
if and only if n is not equal to dThe restriction on recursion is simply that for the outermost pattern p it must be the case that R [[p]]({}, 0).
Next, we define the full syntax.
pattern | ::= | <t:element> name-class pattern+ </t:element> |
| <t:element name=" QName">
pattern+ </t:element> |
||
| <t:attribute>
name-class
pattern?
</t:attribute> |
||
| <t:attribute name=" QName"
global?>
pattern?
</t:attribute> |
||
| <t:anyString/> |
||
| <t:string white-space?>
string
</t:string>
|
||
| <t:data type=" NCName"/>
|
||
| < QName t:role="datatype" attributes>
anything
</ QName>
|
||
| <t:empty/> |
||
| <t:notAllowed/> |
||
| <t:oneOrMore>
pattern+
</t:oneOrMore> |
||
| <t:zeroOrMore>
pattern+
</t:zeroOrMore> |
||
| <t:optional>
pattern+
</t:optional> |
||
| <t:mixed>
pattern+
</t:mixed> |
||
| <t:group>
pattern+
</t:group> |
||
| <t:choice>
pattern+
</t:choice> |
||
| <t:interleave>
pattern+
</t:interleave> |
||
| <t:concur>
pattern+
</t:concur> |
||
| <t:ref name=" NCName" parent?/> |
||
| <t:grammar>
grammar-element* </t:grammar> |
||
| <t:include href=" URIReference"/> |
||
grammar-element | ::= | <t:define name=" NCName" combine?> pattern+
</t:define> |
| <t:start combine?> pattern+
</t:start> |
||
| <t:start name=" NCName" combine?> pattern+
</t:start> |
||
| <t:include href=" URIReference"/> |
||
name-class | ::= | <t:name>
QName
</t:name>
|
| <t:anyName/> |
||
| <t:nsName/> |
||
| <t:choice>
name-class+
</t:choice> |
||
| <t:difference>
name-class+
</t:difference> |
||
| <t:not>
name-class
</t:not> |
||
global | ::= | global="true" |
| global="false" |
||
white-space | ::= | whiteSpace="preserve" |
| whiteSpace="normalize" |
||
parent | ::= | parent="true" |
| parent="false" |
||
combine | ::= | combine="replace" |
| combine="choice" |
||
| combine="group" |
||
| combine="interleave" |
||
| combine="concur" |
In addition, any element may have an ns
attribute.
It is an error if a grammar
element contains multiple
grammar-elements with the same name.
Now, we define the semantics of the full syntax by showing how to transform the full syntax into the primitive syntax.
The first stage of the transformation expands include
elements.
When an include
element appears as a
pattern, it is replaced by the element referenced by the
URI reference contained in the href
attribute. The
referenced element must specify a pattern. If an ns
attribute is present on the include
element but no
ns
attribute is present on the referenced element, then,
before the include
element is replaced, the
ns
attribute is added to the referenced element.
When an include
element occurs as a child of a
grammar
element, then the element referenced by the URI
reference contained in the href
attribute must also be a
grammar
element, and the include
element is
replaced by the children of the referenced grammar
element. If an ns
attribute is present on the
include
element or on the referenced grammar
element, then, before the include
element is replaced, an
ns
attribute is added to each child element of the
referenced grammar element that does not have an ns
attribute; the value of the added ns
attribute is equal
to the value of the ns
attribute on the referenced
grammar
element if it has one, and otherwise is equal to
the value of the ns
attribute on the include
element.
It is an error if, before the replacement of any
include
elements, a grammar
element has two
or more child elements with the same name or has two or more
start
child elements, even if those child elements have
combine
attributes.
If the replacement of an include
element also contains
include
elements, then those elements are replaced
recursively. It is an error a pattern included by a some URI directly
or indirectly includes a pattern with that same URI; relative URIs
must be resolved to absolute URIs before testing whether they are the
same URI.
Namespace expansion makes namespaces explicit on all elements that specify names.
First, an ns
attribute is added to any of the
following categories of element if they do not already have an
ns
attribute:
name
element
nsName
element
element
element with a name
attribute
data
element
attribute
element with a name
attribute and with a global
attribute with a value of
true
The value of the added ns
attribute is equal to the
value of the ns
attribute of the nearest ancestor that
has an ns
attribute, or the empty string if there is no
such ancestor. Note that ancestor means ancestor in the tree after
inclusions have been resolved. Thus, ns
attributes
inherit into included patterns.
Next, for
element
or attribute
element where the value of the name
attribute contains a
prefix, and
data
element where the value of the
type
attribute contains a prefix, and
name
element where the content
includes a prefix,
the prefix together with the colon is removed and an
ns
attribute is added, replacing any existing
ns
attribute. The value of the added ns
attribute is equal to the namespace URI declared by the in-scope
namespace declarations for that prefix. It is an error if there is no
declaration for that prefix. Note that the in-scope namespace
declarations of an element are not changed when it is included. Thus
namespace declarations are not inherited into included
patterns.
Next, an ns=""
attribute is added to any
attribute
element with a name
attribute that
does not yet have an ns
attribute.
Finally, any ns
attribute on an element that is not
one of
name
element
nsName
element
element
element with a name
attribute
attribute
element with a name
attribute
data
element
is removed. Also any global
attribute on an
attribute
element is removed.
The following rules specify how to transform a pattern using the full syntax into a pattern using the primitive syntax.
<t:mixed>
pattern
<t:mixed>
=> <t:interleave>
<t:anyString/>
pattern
</t:interleave>
<t:optional>
pattern
</t:optional>
=> <t:choice>
<t:empty/>
pattern
</t:choice>
<t:zeroOrMore>
pattern
</t:zeroOrMore>
=> <t:optional>
<t:oneOrMore>
pattern
</t:oneOrMore>
</t:optional>
<t:group>
pattern
</t:group>
=> pattern
<t:interleave>
pattern
</t:interleave>
=> pattern
<t:choice>
pattern
</t:choice>
=> pattern
<t:concur>
pattern
</t:concur>
=> pattern
<t:element>
name-class
pattern1
pattern2+
</t:element>
=> <t:element>
name-class <t:group>
pattern1
pattern2+ </t:group>
</t:element>
<t:oneOrMore>
pattern1
pattern2+
</t:oneOrMore>
=> <t:oneOrMore>
<t:group>
pattern1
pattern2+ </t:group>
</t:oneOrMore>
<t:optional>
pattern1
pattern2+
</t:optional>
=> <t:optional>
<t:group>
pattern1
pattern2+ </t:group>
</t:optional>
<t:zeroOrMore>
pattern1
pattern2+
</t:zeroOrMore>
=> <t:zeroOrMore>
<t:group>
pattern1
pattern2+ </t:group>
</t:zeroOrMore>
<t:mixed>
pattern1
pattern2+ </t:mixed>
=>
<t:mixed>
<t:group>
pattern1
pattern2+ </t:group>
</t:mixed>
<t:not>
pattern1
pattern2+ </t:not>
=>
<t:not>
<t:group>
pattern1
pattern2+ </t:group>
</t:not>
<t:start>
pattern1
pattern2+ </t:start>
=>
<t:start>
<t:group>
pattern1
pattern2+ </t:group>
</t:start>
<t:define>
pattern1
pattern2+ </t:define>
=>
<t:define>
<t:group>
pattern1
pattern2+ </t:group>
</t:define>
<t:attribute>
name-class
<t:/attribute>
=> <t:attribute>
name-class <t:anyString/>
<t:/attribute>
<t:start name="
name">
pattern+ <t:/start>
=>
<t:start>
pattern+
<t:/start>
<t:define name="
name">
pattern+ <t:/define>
definitions1 <t:start>
pattern <t:/start>
definitions2 => <t:start>
pattern <t:/start>
definitions1
definitions2
<t:element name="
NCName"
ns="
ns">
pattern+ <t:/element>
=>
<t:element>
<t:name
ns="
ns">
NCName<t:/name>
pattern+ <t:/element>
<t:attribute name="
NCName"
ns="
ns">
pattern?
<t:/attribute>
=> <t:attribute>
<t:name
ns="
ns">
NCName<t:/name>
pattern? <t:/attribute>
<t:group>
pattern1
pattern2
pattern3+ </t:group>
=> <t:group>
<t:group>
pattern1
pattern2 </t:group>
pattern3+
</t:group>
<t:choice>
pattern1
pattern2
pattern3+ </t:choice>
=> <t:choice>
<t:choice>
pattern1
pattern2 </t:choice>
pattern3+
</t:choice>
<t:interleave>
pattern1
pattern2
pattern3+
</t:interleave>
=> <t:interleave>
<t:interleave>
pattern1
pattern2
</t:interleave>
pattern3+
</t:interleave>
<t:concur>
pattern1
pattern2
pattern3+ </t:concur>
=> <t:concur>
<t:concur>
pattern1
pattern2 </t:concur>
pattern3+
</t:concur>
<t:string>
string
</t:string>
=> <t:string
whiteSpace="normalize">
string
</t:string>
<t:ref
name="
NCName"/>
=> <t:ref
parent="false" name="
NCName"/>
<t:choice>
name-class1
name-class2
name-class3+ </t:choice>
=> <t:choice>
<t:choice>
name-class1
name-class2 </t:choice>
name-class3+
</t:choice>
<t:difference>
name-class1
name-class2
name-class3+ </t:difference>
=> <t:difference>
<t:difference>
name-class1
name-class2 </t:difference>
name-class3+
</t:difference>
<t:choice>
name-class
</t:choice>
=> name-class
<t:difference>
name-class
</t:difference>
=> name-class
<t:not>
name-class
</t:not>
=> <t:difference>
<t:anyName/>
name-class
</t:difference>
Finally, conflicting define
and start
elements are combined. The combine
attribute controls
how the elements are combined. A define
element conflicts
with a preceding define
element if it has the same name.
A start
element conflicts with any preceding
start
element. Each define
and
start
child element is examined in turn. If the current
element conflicts with a preceding element, then if the current
element does not have a combine
attribute, it is an
error; otherwise, it is combined with the preceding element with which
it conflicts according to the value of the combine
attribute. If the value of the combine
attribute is
replace
, then the current element replaces the
conflicting element. Otherwise, the value of the combine
attribute is the name of the element to use to combine the pattern
contained in the current element with the pattern contained in the
conflicting element. In other words,
<t:define name="
x">
p1</t:define>
and
<t:define name="
x" combine="
e">
p2</t:define>
will be combined into
<t:define name="
x"> <t:
e>
p1 p2 </t:
e> </t:define>
If there is a combine
attribute, but there is no
conflicting preceding definition, then the combine
attribute is ignored.
<grammar ns="http://www.thaiopensource.com/trex" xmlns="http://www.thaiopensource.com/trex"> <start name="pattern"> <choice> <element name="element"> <choice> <attribute name="name"/> <ref name="open-name-class"/> </choice> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="attribute"> <ref name="common-atts"/> <choice> <group> <attribute name="name"/> <optional> <attribute name="global"> <choice> <string>true</string> <string>false</string> </choice> </attribute> </optional> </group> <ref name="open-name-class"/> </choice> <interleave> <ref name="other"/> <optional> <ref name="pattern"/> </optional> </interleave> </element> <element name="group"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="interleave"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="choice"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="concur"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="optional"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="zeroOrMore"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="oneOrMore"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="mixed"> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="ref"> <attribute name="name"/> <optional> <attribute name="parent"> <choice> <string>true</string> <string>true</string> </choice> </attribute> </optional> <ref name="common-atts"/> </element> <element name="empty"> <ref name="common-atts"/> <ref name="other"/> </element> <element name="anyString"> <ref name="common-atts"/> <ref name="other"/> </element> <element name="string"> <optional> <attribute name="whiteSpace"> <choice> <string>preserve</string> <string>normalize</string> </choice> </attribute> </optional> <ref name="common-atts"/> <mixed> <ref name="other"/> </mixed> </element> <element name="data"> <attribute name="type"/> <ref name="common-atts"/> <ref name="other"/> </element> <ref name="anonymous-datatype"/> <element name="notAllowed"> <ref name="common-atts"/> <ref name="other"/> </element> <element name="include"> <attribute name="href"/> <ref name="common-atts"/> <ref name="other"/> </element> <element name="grammar"> <ref name="common-atts"/> <interleave> <ref name="other"/> <oneOrMore> <ref name="grammar-element"/> </oneOrMore> </interleave> </element> </choice> </start> <define name="grammar-element"> <choice> <element name="define"> <attribute name="name"/> <ref name="combine-att"/> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="start"> <optional> <attribute name="name"/> </optional> <ref name="combine-att"/> <ref name="common-atts"/> <ref name="open-patterns"/> </element> <element name="include"> <attribute name="href"/> </element> </choice> </define> <define name="anonymous-datatype"> <element> <not> <nsName/> </not> <attribute name="role" global="true"> <string>datatype</string> </attribute> <zeroOrMore> <choice> <attribute> <not> <nsName/> </not> </attribute> <ref name="any"/> <anyString/> </choice> </zeroOrMore> </element> </define> <define name="combine-att"> <optional> <attribute name="combine"> <choice> <string>replace</string> <string>choice</string> <string>group</string> <string>interleave</string> <string>concur</string> </choice> </attribute> </optional> </define> <define name="open-patterns"> <interleave> <ref name="other"/> <oneOrMore> <ref name="pattern"/> </oneOrMore> </interleave> </define> <define name="name-class"> <choice> <element name="name"> <ref name="common-atts"/> <mixed> <ref name="other"/> </mixed> </element> <element name="anyName"> <ref name="common-atts"/> <ref name="other"/> </element> <element name="nsName"> <ref name="common-atts"/> <ref name="other"/> </element> <element name="not"> <ref name="common-atts"/> <ref name="open-name-class"/> </element> <element name="difference"> <ref name="common-atts"/> <oneOrMore> <ref name="open-name-class"/> </oneOrMore> </element> <element name="choice"> <ref name="common-atts"/> <oneOrMore> <ref name="open-name-class"/> </oneOrMore> </element> </choice> </define> <define name="open-name-class"> <interleave> <ref name="other"/> <ref name="name-class"/> </interleave> </define> <define name="common-atts"> <optional> <attribute name="ns"/> </optional> <zeroOrMore> <attribute> <not> <choice> <nsName/> <nsName ns=""/> </choice> </not> </attribute> </zeroOrMore> </define> <define name="other"> <zeroOrMore> <element> <not> <nsName/> </not> <zeroOrMore> <choice> <attribute> <not> <nsName/> </not> </attribute> <anyString/> <ref name="any"/> </choice> </zeroOrMore> </element> </zeroOrMore> </define> <define name="any"> <element> <anyName/> <zeroOrMore> <choice> <attribute> <anyName/> </attribute> <anyString/> <ref name="any"/> </choice> </zeroOrMore> </element> </define> </grammar>