[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Keeping entity references unchanged with xmlParseFile
- From: Daniel Veillard <veillard redhat com>
- To: bvh <bvh-public irule be>
- Cc: xml gnome org
- Subject: Re: [xml] Keeping entity references unchanged with xmlParseFile
- Date: Tue, 9 Dec 2003 17:31:43 -0500
On Tue, Dec 09, 2003 at 10:26:19PM +0100, bvh wrote:
> I am using libxml2.5.11 to parse a docbook file. Currently I use xmlParseFile(...)
> to parse and convert the whole document to a tree. Later I traverse the tree and
> output a LaTeX file. I'd like xmllib to leave entity references like à
> alone so that I can convert them myself.
they are left in the tree, apparently you din't found them, or you asked
for their replacement but it's not libxml2 default behaviour.
> I already figured out that I had to do
>
> xmlLoadExtDtdDefaultValue = XML_DETECT_IDS;
>
> to get libxml to look up the catalog file instead of complaining about unknown
> entity references.
Well libxml2 doesn't load the DTD by default. If you reference entities
the parser emits a *warning* about the fact that the entities is not defined.
> Although this is suboptimal (the input themselve is
> generated and known to be valid, well-formed, etc.. so I don't really need
> libxml to verify that yes it's valid according to the dtd) I can live with it.
Well you know it, then ignore the warning (that can be done programmatically
too of course !).
> However the reference entities seem to be skipped over completly. For
> example for
>
> <para>Foo à bar</para>
>
> I get simple two text node with "Foo " and " bar" as content under the para node.
I bet you missed the entity reference node between those 2 nodes !!!
By default libxml2
1/ does not repace entities by theur content (unless you asked for it !)
2/ coalesce adjacent text nodes.
> Is there some documentation that explains how API is supposed to
> work together? The doc on the website leaves me a little frustrated because it
> is either to superficial or to incoherent.
there is 1400+ functions in the API. Using the tree requires to be able to
walk the tree and analyze it. If you have trouble doing so, use the xmlReader.
> Character entities get through but are converted to utf-8 encoded. Although
> not critical, I'd much rather have them as character entities in the character
> nodes.
no way, "character entities" are *not* entities !!! they are character
*references* and reference a UNICODE code point which has *nothing* to
do with entities, you're confused. Parser are not supposed to keep that
information and libxml2 doesn't.
> One more thing : could you please make it more clear on the website which
> parts of the API are in the latest release and which only in CVS? I spent quite
> some time hunting for xmlReadFile before reading in the mailing list archive that
> it's a proposal for a new API...
this has been in the last 3 public releases ! 2.6.0, 2.6.1 and 2.6.2
this information is all over the place, in the archives and in the news
section. If you think it's still not sufficient, we take patches...
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]