These are the slides for a talk given at the XML 2002 conference in Baltimore. They have been combined into a single HTML file.


Converting RELAX NG to XSD

James Clark


Goals

Good-quality XSD

Structure (define/include) preserving

Approximate where necessary

Useful rather than perfect


Conversion strategy

Build RELAX NG object model

Convert RELAX NG object model to intermediate form

Perform transformations on intermediate form

Generate XSD from intermediate form


Intermediate form

Schema language between RELAX NG and XSD

Abstract, no syntax

No mixed element/attribute content models

Clean, simple semantics

Schema structure more controlled than RELAX NG


Intermediate schema components

Simple type definition associates local name with simple type

Attribute group definition associates local name with attribute use

Group definition associates local name with particle

Start declaration declares particle that document element must match

Include references a schema


Points to note

Order of components not semantically significant

Simple type definitions, attribute group definitions, group definitions have distinct symbol spaces

Definitions are named with local names not QNames

Builtin simple types do not have simple type definitions

No target namespace associated with a schema

No complex type declarations, element declarations or attribute declarations


Simple types

Restriction contains the name of builtin simple type and list of facets

List contains a simple type and a minimum/maximum number of occurrences

Union contains a list of simple types

Reference contains a local name referring to a simple type definition


Particles

Element contains an expanded QName and a complex type

Wildcard element contains a wildcard

Repeat contains a particle and a minimum/maximum number of occurrences

Sequence contains one or more particles

Choice contains one or more particles

Interleave contains one or more particles

Reference contains a local name referring to a group definition


Complex types

Complex content contains attribute use, a particle, a mixed flag

Simple content contains attribute use, a simple type


Attribute uses

Attribute contains an expanded QName and a simple type

Optional attribute contains an attribute and a default value

Wildcard attribute contains a wildcard

Attribute group contains a list of zero or more attributes

Attribute use choice contains a list of one or more attribute uses

Reference contains a local name referring to an attribute group definition


Wildcard

Positive/negative flag

Set of namespace URIs

Set of excluded expanded QNames


Conversion from RELAX NG to intermediate form


Pattern analysis

Flags computed based on possible matches of the pattern

empty says if there is a match whose content is empty

text says if there is a match whose content includes a text node that is matched against a text pattern

data says if there is a match whose content includes a text node that is matched against a data, value or list pattern

attribute says if there is a match that includes an attribute

element says if there a match whose content includes an element

Sufficient to allow conversion to intermediate form

Can compute flags for patterns from subpatterns


Converting patterns

A pattern can be converted in three ways:

A pattern may be converted to a particle

A pattern may be converted to a simple type

A pattern may be converted to an attribute use

A single pattern may be converted both to a particle or a simple type and to an attribute use

element patterns are treated like empty when converting to an attribute use

attribute patterns are treated like empty when converting to a particle or simple type


Converting name classes

A name class is converted to:

a set of expanded QNames

a wildcard


Converting an element pattern to a particle

Split name class into wildcard and list of expanded names

Generate wildcard element particle for wildcard

Generate element particle for each expanded name by converting body of element pattern to a complex type

Combine with choice particle


Converting element body to a complex type

If body has element flag, then use a complex type with complex content and convert body to a particle

Mixed if body has either data or text flag

If body has data flag but neither text nor element flag, then use a complex type with simple content and convert body to a simple type

In addition, convert body to an attribute use


Converting a define

If body has attribute flag, then generate an attribute group definition by converting body to an attribute use

If body has element flag, then generate a group definition by converting body to a particle

If body has data flag but neither text nor element flag, then generate a simple type definition by converting body to simple type


Converting a list

Intermediate form like XSD not RELAX NG

Compute minimum and maximum number of tokens in list

Compute union of simple types of possible members of list


Transformations on intermediate form

Transform out attribute choice

Transform out interleave except where XSD allows it

Combine attribute wildcards

Combine unions of simple types with enumeration facet


Conversion from intermediate form into XSD


Namespace analysis

Assign target namespace to each file in intermediate schema

Choose or create principal file for every namespace

Determine which attributes, element particles need to be moved

Determine which negative wildcards need to be moved

Determine which attributes, element particles should be global

Null namespace needs special treatment


Complex type analysis

Identify cases where complex type can be used instead of

Simple type definition and optionally attribute group definition, or

Group definition and optionally attribute group definition

All references must be such that they can turn into

the type of an element

the base type of a complex type extension


XSD output

Take advantage of XSD shorthands

Generate complex type definitions

Generate global element/attribute declarations

Generate bridging definitions for non-global moved elements/attributes

Generate bridging definitions for negative wildcards

Deal with attribute wildcards


Possible improvements

Avoid violating unique particle attribution constraint

Avoid violating element declarations consistent constraint

Take advantage of substitution groups

Better handling of interleave

Inform user about all approximations

Generate annotations using eg Schematron to make approximations exact

Handle RELAX NG overrides using redefine


Implementation

Trang (Translator for RELAX NG Schemas)

Open source

http://www.thaiopensource.com/relaxng/trang.html