Copyright © 2001 Thai Open Source Software Center Ltd
See the file copying.txt for copying permission.
DTDinst is a program for converting XML DTDs into XML instance format. The XML instance can be in either a format specific to DTDinst or RELAX NG format.
The key feature of DTDinst is its handling of parameter entities. It is able to reliably turn parameter entity declarations and references into a variety of higher-level semantic constructs. It can do this even in the presence of arbitrarily deep nesting of parameter entity references within parameter entity declarations. At the same time, it accurately follows XML 1.0 rules on parameter entity expansion, so that any valid XML 1.0 DTD can be handled. If a parameter entity is used in a way that does not correspond to any of the higher-level semantics constructs supported by DTDinst, then references to that parameter entity will be expanded in the DTDinst output.
DTDinst is available as a precompiled JAR file. The source is also available.
You also need to have a Java runtime environment installed on your system.
To run DTDinst, use a command of the form:
java -jar dtdinst.jar [ -i ] [ -r dir ] DTD
The DTD
argument can be either a file or a
URL.
If the -r
option is not specified, DTDinst writes an
XML representation of the DTD in DTDinst format
to the standard output. For example, the command
java -jar dtdinst.jar http://www.w3.org/XML/1998/06/xmlspec-v21.dtd >xmlspec.xml
would write an XML representation of the W3C xmlspec DTD to the
file xmlspec.xml
.
With the -r
option, DTDinst writes one or more files
containing a RELAX NG schema to the directory
dir
. For example, the command
java -jar dtdinst.jar -r relax http://www.xml.gr.jp/relax/relaxCore.dtd
would write a RELAX NG schema for RELAX Core to the directory
relax
. The directory would contain a
relaxCore.rng
file corresponding to
relaxCore.dtd
, and would also contain a
datatypes.rng
file corresponding to
datatypes.dtd
, which is referenced by
relaxCore.dtd
.
The -i
option tells DTDinst to inline ATTLIST
declarations.
Without this option, DTDinst will generate a define
in the RELAX NG schema
for each ATTLIST
declaration in the DTD. With this option, DTDinst will
move the patterns generated from ATTLIST
declarations into the corresponding
element
pattern.
The DTDinst format is designed to represent the parameterization of the DTD as fully as possible.
There is a schema for this format in RELAX NG compact syntax; the schema is also available in RELAX NG format.
Each parameter entity declaration is represented by one of the following elements:
modelGroup
is used for a parameter entity that
represents all or part of the content model of an element (example, DTDinst output)attributeGroup
is used for a parameter entity
containing zero or more attribute definitions, which can be
referenced in an ATTLIST
declaration (example, DTDinst output)attributeDefault
is used for a parameter entity
that represents the default value of an attribute (example, DTDinst output)datatype
is used for a parameter entity
that represents an attribute type (example, DTDinst output)enumGroup
is used for a parameter entity that
contains zero or more enumerated values (example, DTDinst output)flag
is used for a parameter entity with replacement
text INCLUDE
or IGNORE
, which can be used to
control a conditional section (example, DTDinst output)nameSpec
is used for a parameter entity that represents
the name of an element or attribute (example, DTDinst output)externalId
is used for an external parameter entity
that does not fall into any of the above categories (example, DTDinst output)param
is used for an internal parameter entity
that does not fall into any of the above categories (example, DTDinst output)overridden
is used for a parameter entity declaration
that is overridden by an earlier declaration of the same parameter
entity (example, DTDinst output)The element used to represent a parameter entity reference depends on the element used to represent the declaration of the parameter entity.
modelGroup
,
attributeGroup
, attributeDefault
,
datatype
, enumGroup
, flag
or
nameSpec
element, then the reference will be represented
by a modelGroupRef
, attributeGroupRef
,
attributeDefaultRef
, datatypeRef
,
enumGroupRef
, flagRef
or
nameSpecRef
element respectively.externalId
and
the reference occurs at the declaration level (i.e. at a point where a
declaration would be allowed), then the reference will be represented
by a externalIdRef
element containing the declarations
from the external entity.An XSLT stylesheet is available that converts DTDinst format to RELAX NG. It has many more limitations than the converter builtin to DTDinst, but it may be useful as a basis for XSLT-based processing of DTDinst format.
You may find it interesting to experiment with the following XML DTDs which are available online:
DTDinst does not attempt to understand the contents of ignored
conditional sections: DTDinst format represents the contents of an
ignored section as a string. If you wish to preserve information
about conditional sections, you should therefore make as many
conditional sections as possible be included marked sections rather
than ignored marked sections. You can do this by creating a wrapper
DTD that declares parameter entities as INCLUDE
and then
references the real DTD. For example, you might use this wrapper DTD to convert the TEI P4 DTD.
DTDinst does not attempt to understand the contents of parameter entities that are never referenced.
The conversion to RELAX NG preserves neither conditional sections nor overridden parameter entity declarations. If you need to preserve these, the recommended approach is to generate RELAX NG from DTDinst format using a transformation (perhaps written in XSLT) customized for for your particular DTD.
Please send bug reports to jjc@thaiopensource.com. Be sure to include a complete DTD for which DTDinst exhibits the bug.
James Clark