[libxml2.wiki] Create Parser interfaces



commit 1c00fa33508c40b874436a7d5271864d843017b4
Author: Nick Wellnhofer <wellnhofer aevum de>
Date:   Sat Feb 12 18:13:41 2022 +0000

    Create Parser interfaces

 Parser-interfaces.md | 217 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)
---
diff --git a/Parser-interfaces.md b/Parser-interfaces.md
new file mode 100644
index 0000000..03461ba
--- /dev/null
+++ b/Parser-interfaces.md
@@ -0,0 +1,217 @@
+This section is directly intended to help programmers getting bootstrapped using the XML tollkit from the C 
language. It is not intended to be extensive. I hope the automatically generated documents will provide the 
completeness required, but as a separate set of documents. The interfaces of the XML parser are by principle 
low level, Those interested in a higher level API should [look at DOM](http://xmlsoft.org/library.html#DOM).
+
+The [parser interfaces for XML](http://xmlsoft.org/html/libxml-parser.html) are separated from the [HTML 
parser interfaces](http://xmlsoft.org/html/libxml-htmlparser.html). Let's have a look at how the XML parser 
can be called:
+
+### Invoking the parser : the pull method
+
+Usually, the first thing to do is to read an XML input. The parser accepts documents either from in-memory 
strings or from files. The functions are defined in "parser.h":
+
+<dl>
+<dt>
+
+`xmlDocPtr xmlParseMemory(char *buffer, int size);`
+
+</dt>
+<dd>Parse a null-terminated string containing the document.</dd>
+</dl>
+<dl>
+<dt>
+
+`xmlDocPtr xmlParseFile(const char *filename);`
+
+</dt>
+<dd>Parse an XML document contained in a (possibly compressed) file.</dd>
+</dl>The parser returns a pointer to the document structure (or NULL in case of failure).
+
+### Invoking the parser: the push method
+
+In order for the application to keep the control when the document is being fetched (which is common for GUI 
based programs) libxml2 provides a push interface, too, as of version 1.8.3. Here are the interface functions:
+
+```
+xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
+                                         void *user_data,
+                                         const char *chunk,
+                                         int size,
+                                         const char *filename);
+int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
+                                         const char *chunk,
+                                         int size,
+                                         int terminate);
+```
+
+and here is a simple example showing how to use the interface:
+
+```
+            FILE *f;
+
+            f = fopen(filename, "r");
+            if (f != NULL) {
+                int res, size = 1024;
+                char chars[1024];
+                xmlParserCtxtPtr ctxt;
+
+                res = fread(chars, 1, 4, f);
+                if (res > 0) {
+                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
+                                chars, res, filename);
+                    while ((res = fread(chars, 1, size, f)) > 0) {
+                        xmlParseChunk(ctxt, chars, res, 0);
+                    }
+                    xmlParseChunk(ctxt, chars, 0, 1);
+                    doc = ctxt->myDoc;
+                    xmlFreeParserCtxt(ctxt);
+                }
+            }
+```
+
+The HTML parser embedded into libxml2 also has a push interface; the functions are just prefixed by "html" 
rather than "xml".
+
+### Invoking the parser: the SAX interface
+
+The tree-building interface makes the parser memory-hungry, first loading the document in memory and then 
building the tree itself. Reading a document without building the tree is possible using the SAX interfaces 
(see SAX.h and [James Henstridge's documentation](http://www.daa.com.au/\~james/gnome/xml-sax/xml-sax.html)). 
Note also that the push interface can be limited to SAX: just use the two first arguments of 
`xmlCreatePushParserCtxt()`.
+
+### Building a tree from scratch
+
+The other way to get an XML tree in memory is by building it. Basically there is a set of functions 
dedicated to building new elements. (These are also described in <libxml/tree.h>.) For example, here is a 
piece of code that produces the XML document used in the previous examples:
+
+```
+    #include <libxml/tree.h>
+    xmlDocPtr doc;
+    xmlNodePtr tree, subtree;
+
+    doc = xmlNewDoc("1.0");
+    doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
+    xmlSetProp(doc->children, "prop1", "gnome is great");
+    xmlSetProp(doc->children, "prop2", "& linux too");
+    tree = xmlNewChild(doc->children, NULL, "head", NULL);
+    subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
+    tree = xmlNewChild(doc->children, NULL, "chapter", NULL);
+    subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
+    subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
+    subtree = xmlNewChild(tree, NULL, "image", NULL);
+    xmlSetProp(subtree, "href", "linus.gif");
+```
+
+Not really rocket science ...
+
+### Traversing the tree
+
+Basically by [including "tree.h"](http://xmlsoft.org/html/libxml-tree.html) your code has access to the 
internal structure of all the elements of the tree. The names should be somewhat simple like **parent**, 
**children**, **next**, **prev**, **properties**, etc... For example, still with the previous example:
+
+```
+doc->children->children->children
+```
+
+points to the title element,
+
+```
+doc->children->children->next->children->children
+```
+
+points to the text node containing the chapter title "The Linux adventure".
+
+**NOTE**: XML allows _PI_s and _comments_ to be present before the document root, so `doc->children` may 
point to an element which is not the document Root Element; a function `xmlDocGetRootElement()` was added for 
this purpose.
+
+### Modifying the tree
+
+Functions are provided for reading and writing the document content. Here is an excerpt from the [tree 
API](http://xmlsoft.org/html/libxml-tree.html):
+
+<dl>
+<dt>
+
+`xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar `_`name, const xmlChar`_ `value);`
+
+</dt>
+<dd>This sets (or changes) an attribute carried by an ELEMENT node. The value can be NULL.</dd>
+</dl>
+<dl>
+<dt>
+
+`const xmlChar `_`xmlGetProp(xmlNodePtr node, const xmlChar`_ `name);`
+
+</dt>
+<dd>This function returns a pointer to new copy of the property content. Note that the user must deallocate 
the result.</dd>
+</dl>Two functions are provided for reading and writing the text associated with elements:
+
+<dl>
+<dt>
+
+`xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar *value);`
+
+</dt>
+<dd>This function takes an "external" string and converts it to one text node or possibly to a list of 
entity and text nodes. All non-predefined entity references like &Gnome; will be stored internally as entity 
nodes, hence the result of the function may not be a single node.</dd>
+</dl>
+<dl>
+<dt>
+
+`xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int inLine);`
+
+</dt>
+<dd>
+
+This function is the inverse of `xmlStringGetNodeList()`. It generates a new string containing the content 
of the text and entity nodes. Note the extra argument inLine. If this argument is set to 1, the function will 
expand entity references. For example, instead of returning the &Gnome; XML encoding in the string, it will 
substitute it with its value (say, "GNU Network Object Model Environment").
+
+</dd>
+</dl>### Saving a tree
+
+Basically 3 options are possible:
+
+<dl>
+<dt>
+
+`void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int *size);`
+
+</dt>
+<dd>Returns a buffer into which the document has been saved.</dd>
+</dl>
+<dl>
+<dt>
+
+`extern void xmlDocDump(FILE *f, xmlDocPtr doc);`
+
+</dt>
+<dd>Dumps a document to an open file descriptor.</dd>
+</dl>
+<dl>
+<dt>
+
+`int xmlSaveFile(const char *filename, xmlDocPtr cur);`
+
+</dt>
+<dd>Saves the document to a file. In this case, the compression interface is triggered if it has been turned 
on.</dd>
+</dl>### Compression
+
+The library transparently handles compression when doing file-based accesses. The level of compression on 
saves can be turned on either globally or individually for one file:
+
+<dl>
+<dt>
+
+`int xmlGetDocCompressMode (xmlDocPtr doc);`
+
+</dt>
+<dd>Gets the document compression ratio (0-9).</dd>
+</dl>
+<dl>
+<dt>
+
+`void xmlSetDocCompressMode (xmlDocPtr doc, int mode);`
+
+</dt>
+<dd>Sets the document compression ratio.</dd>
+</dl>
+<dl>
+<dt>
+
+`int xmlGetCompressMode(void);`
+
+</dt>
+<dd>Gets the default compression ratio.</dd>
+</dl>
+<dl>
+<dt>
+
+`void xmlSetCompressMode(int mode);`
+
+</dt>
+<dd>Sets the default compression ratio.</dd>
+</dl>Daniel Veillard
\ No newline at end of file


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]