[libxml2.wiki] Create Validation and DTDs



commit 35abc7b3660a44e99d75809b702b8393406e837e
Author: Nick Wellnhofer <wellnhofer aevum de>
Date:   Sat Feb 12 17:54:12 2022 +0000

    Create Validation and DTDs

 Validation-and-DTDs.md | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)
---
diff --git a/Validation-and-DTDs.md b/Validation-and-DTDs.md
new file mode 100644
index 0000000..2369eda
--- /dev/null
+++ b/Validation-and-DTDs.md
@@ -0,0 +1,110 @@
+### General overview
+
+Well what is validation and what is a DTD ?
+
+DTD is the acronym for Document Type Definition. This is a description of the content for a family of XML 
files. This is part of the XML 1.0 specification, and allows one to describe and verify that a given document 
instance conforms to the set of rules detailing its structure and content.
+
+Validation is the process of checking a document against a DTD (more generally against a set of construction 
rules).
+
+The validation process and building DTDs are the two most difficult parts of the XML life cycle. Briefly a 
DTD defines all the possible elements to be found within your document, what is the formal shape of your 
document tree (by defining the allowed content of an element; either text, a regular expression for the 
allowed list of children, or mixed content i.e. both text and children). The DTD also defines the valid 
attributes for all elements and the types of those attributes.
+
+### The definition
+
+The [W3C XML Recommendation](http://www.w3.org/TR/REC-xml) ([Tim Bray's annotated version of 
Rev1](http://www.xml.com/axml/axml.html)):
+
+* [Declaring elements](http://www.w3.org/TR/REC-xml#elemdecls)
+* [Declaring attributes](http://www.w3.org/TR/REC-xml#attdecls)
+
+(unfortunately) all this is inherited from the SGML world, the syntax is ancient...
+
+### Simple rules
+
+Writing DTDs can be done in many ways. The rules to build them if you need something permanent or something 
which can evolve over time can be radically different. Really complex DTDs like DocBook ones are flexible but 
quite harder to design. I will just focus on DTDs for a formats with a fixed simple structure. It is just a 
set of basic rules, and definitely not exhaustive nor usable for complex DTD design.
+
+#### How to reference a DTD from a document:
+
+Assuming the top element of the document is `spec` and the dtd is placed in the file `mydtd` in the 
subdirectory `dtds` of the directory from where the document were loaded:
+
+`<!DOCTYPE spec SYSTEM "dtds/mydtd">`
+
+Notes:
+
+* The system string is actually an URI-Reference (as defined in [RFC 
2396](http://www.ietf.org/rfc/rfc2396.txt)) so you can use a full URL string indicating the location of your 
DTD on the Web. This is a really good thing to do if you want others to validate your document.
+* It is also possible to associate a `PUBLIC` identifier (a magic string) so that the DTD is looked up in 
catalogs on the client side without having to locate it on the web.
+* A DTD contains a set of element and attribute declarations, but they don't define what the root of the 
document should be. This is explicitly told to the parser/validator as the first element of the `DOCTYPE` 
declaration.
+
+#### Declaring elements:
+
+The following declares an element `spec`:
+
+`<!ELEMENT spec (front, body, back?)>`
+
+It also expresses that the spec element contains one `front`, one `body` and one optional `back` children 
elements in this order. The declaration of one element of the structure and its content are done in a single 
declaration. Similarly the following declares `div1` elements:
+
+`<!ELEMENT div1 (head, (p | list | note)*, div2?)>`
+
+which means div1 contains one `head` then a series of optional `p`, `list`s and `note`s and then an optional 
`div2`. And last but not least an element can contain text:
+
+`<!ELEMENT b (#PCDATA)>`
+
+`b` contains text or being of mixed content (text and elements in no particular order):
+
+`<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>`
+
+`p `can contain text or `a`, `ul`, `b`, `i `or `em` elements in no particular order.
+
+#### Declaring attributes:
+
+Again the attributes declaration includes their content definition:
+
+`<!ATTLIST termdef name CDATA #IMPLIED>`
+
+means that the element `termdef` can have a `name` attribute containing text (`CDATA`) and which is optional 
(`#IMPLIED`). The attribute value can also be defined within a set:
+
+`<!ATTLIST list type (bullets|ordered|glossary) "ordered">`
+
+means `list` element have a `type` attribute with 3 allowed values "bullets", "ordered" or "glossary" and 
which default to "ordered" if the attribute is not explicitly specified.
+
+The content type of an attribute can be text (`CDATA`), anchor/reference/references (`ID`/`IDREF`/`IDREFS`), 
entity(ies) (`ENTITY`/`ENTITIES`) or name(s) (`NMTOKEN`/`NMTOKENS`). The following defines that a `chapter` 
element can have an optional `id` attribute of type `ID`, usable for reference from attribute of type IDREF:
+
+`<!ATTLIST chapter id ID #IMPLIED>`
+
+The last value of an attribute definition can be `#REQUIRED `meaning that the attribute has to be given, 
`#IMPLIED` meaning that it is optional, or the default value (possibly prefixed by `#FIXED` if it is the only 
allowed).
+
+Notes:
+
+* Usually the attributes pertaining to a given element are declared in a single expression, but it is just a 
convention adopted by a lot of DTD writers:
+
+  ```
+  <!ATTLIST termdef
+            id      ID      #REQUIRED
+            name    CDATA   #IMPLIED>
+  ```
+
+  The previous construct defines both `id` and `name` attributes for the element `termdef`.
+
+### Some examples
+
+The directory `test/valid/dtds/` in the libxml2 distribution contains some complex DTD examples. The example 
in the file `test/valid/dia.xml` shows an XML file where the simple DTD is directly included within the 
document.
+
+### How to validate
+
+The simplest way is to use the xmllint program included with libxml. The `--valid` option turns-on 
validation of the files given as input. For example the following validates a copy of the first revision of 
the XML 1.0 specification:
+
+`xmllint --valid --noout test/valid/REC-xml-19980210.xml`
+
+the -- noout is used to disable output of the resulting tree.
+
+The `--dtdvalid dtd` allows validation of the document(s) against a given DTD.
+
+Libxml2 exports an API to handle DTDs and validation, check the [associated 
description](http://xmlsoft.org/html/libxml-valid.html).
+
+### Other resources
+
+DTDs are as old as SGML. So there may be a number of examples on-line, I will just list one for now, others 
pointers welcome:
+
+* [XML-101 DTD](http://www.xml101.com:8081/dtd/)
+
+I suggest looking at the examples found under test/valid/dtd and any of the large number of books available 
on XML. The dia example in test/valid should be both simple and complete enough to allow you to build your 
own.
+
+Daniel Veillard
\ No newline at end of file


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]