[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] Keeping entity references unchanged with xmlParseFile



On Tue, Dec 09, 2003 at 10:26:19PM +0100, bvh wrote:
> I am using libxml2.5.11 to parse a docbook file. Currently I use xmlParseFile(...)
> to parse and convert the whole document to a tree. Later I traverse the tree and
> output a LaTeX file. I'd like xmllib to leave entity references like à
> alone so that I can convert them myself.

  they are left in the tree, apparently you din't found them, or you asked
for their replacement but it's not libxml2 default behaviour.

> I already figured out that I had to do
> 
> xmlLoadExtDtdDefaultValue = XML_DETECT_IDS;
> 
> to get libxml to look up the catalog file instead of complaining about unknown
> entity references.

  Well libxml2 doesn't load the DTD by default. If you reference entities
the parser emits a *warning* about the fact that the entities is not defined.

> Although this is suboptimal (the input themselve is
> generated and known to be valid, well-formed, etc.. so I don't really need
> libxml to verify that yes it's valid according to the dtd) I can live with it.

  Well you know it, then ignore the warning (that can be done programmatically
too of course !).

> However the reference entities seem to be skipped over completly. For
> example for
> 
> <para>Foo &agrave; bar</para>
> 
> I get simple two text node with "Foo " and " bar" as content under the para node.

  I bet you missed the entity reference node between those 2 nodes !!!
By default libxml2 
   1/ does not repace entities by theur content (unless you asked for it !)
   2/ coalesce adjacent text nodes.

> Is there some documentation that explains how API is supposed to
> work together? The doc on the website leaves me a little frustrated because it
> is either to superficial or to incoherent.

  there is 1400+ functions in the API. Using the tree requires to be able to
walk the tree and analyze it. If you have trouble doing so, use the xmlReader.

> Character entities get through but are converted to utf-8 encoded. Although
> not critical, I'd much rather have them as character entities in the character
> nodes.

  no way, "character entities" are *not* entities !!! they are character
*references* and reference a UNICODE code point which has *nothing* to
do with entities, you're confused. Parser are not supposed to keep that
information and libxml2 doesn't.

> One more thing : could you please make it more clear on the website which
> parts of the API are in the latest release and which only in CVS? I spent quite
> some time hunting for xmlReadFile before reading in the mailing list archive that
> it's a proposal for a new API...

  this has been in the last 3 public releases ! 2.6.0, 2.6.1 and 2.6.2
this information is all over the place, in the archives and in the news
section. If you think it's still not sufficient, we take patches...

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]