[libxml2.wiki] Create Entities
- From: Nick Wellnhofer <nwellnhof src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [libxml2.wiki] Create Entities
- Date: Sat, 12 Feb 2022 18:14:31 +0000 (UTC)
commit 38ca66ae1be0328c302c960f7b4852a0002b6fb9
Author: Nick Wellnhofer <wellnhofer aevum de>
Date: Sat Feb 12 18:14:31 2022 +0000
Create Entities
Entities.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
---
diff --git a/Entities.md b/Entities.md
new file mode 100644
index 0000000..a13628d
--- /dev/null
+++ b/Entities.md
@@ -0,0 +1,50 @@
+Entities in principle are similar to simple C macros. An entity defines an abbreviation for a given string
that you can reuse many times throughout the content of your document. Entities are especially useful when a
given string may occur frequently within a document, or to confine the change needed to a document to a
restricted area in the internal subset of the document (at the beginning). Example:
+
+```
+1 <?xml version="1.0"?>
+2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
+3 <!ENTITY xml "Extensible Markup Language">
+4 ]>
+5 <EXAMPLE>
+6 &xml;
+7 </EXAMPLE>
+```
+
+Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing its name with '&' and following it
by ';' without any spaces added. There are 5 predefined entities in libxml2 allowing you to escape characters
with predefined meaning in some parts of the xml document content: **<** for the character '<', **>**
for the character '>', **'** for the character ''', **"** for the character '"', and **&** for
the character '&'.
+
+One of the problems related to entities is that you may want the parser to substitute an entity's content so
that you can see the replacement text in your application. Or you may prefer to keep entity references as
such in the content to be able to save the document back without losing this usually precious information (if
the user went through the pain of explicitly defining entities, he may have a a rather negative attitude if
you blindly substitute them as saving time). The
[xmlSubstituteEntitiesDefault()](http://xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault)
function allows you to check and change the behaviour, which is to not substitute entities by default.
+
+Here is the DOM tree built by libxml2 for the previous document in the default case:
+
+```
+/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
+DOCUMENT
+version=1.0
+ ELEMENT EXAMPLE
+ TEXT
+ content=
+ ENTITY_REF
+ INTERNAL_GENERAL_ENTITY xml
+ content=Extensible Markup Language
+ TEXT
+ content=
+```
+
+And here is the result when substituting entities:
+
+```
+/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
+DOCUMENT
+version=1.0
+ ELEMENT EXAMPLE
+ TEXT
+ content= Extensible Markup Language
+```
+
+So, entities or no entities? Basically, it depends on your use case. I suggest that you keep the
non-substituting default behaviour and avoid using entities in your XML document or data if you are not
willing to handle the entity references elements in the DOM tree.
+
+Note that at save time libxml2 enforces the conversion of the predefined entities where necessary to prevent
well-formedness problems, and will also transparently replace those with chars (i.e. it will not generate
entity reference elements in the DOM tree or call the reference() SAX callback when finding them in the
input).
+
+<span dir="">WARNING</span>: handling entities on top of the libxml2 SAX interface is difficult!!! If you
plan to use non-predefined entities in your documents, then the learning curve to handle then using the SAX
API may be long. If you plan to use complex documents, I strongly suggest you consider using the DOM
interface instead and let libxml deal with the complexity rather than trying to do it yourself.
+
+Daniel Veillard
\ No newline at end of file
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]