[xml] Keeping entity references unchanged with xmlParseFile



Hi,

I am using libxml2.5.11 to parse a docbook file. Currently I use xmlParseFile(...)
to parse and convert the whole document to a tree. Later I traverse the tree and
output a LaTeX file. I'd like xmllib to leave entity references like à
alone so that I can convert them myself.

I already figured out that I had to do

xmlLoadExtDtdDefaultValue = XML_DETECT_IDS;

to get libxml to look up the catalog file instead of complaining about unknown
entity references. Although this is suboptimal (the input themselve is
generated and known to be valid, well-formed, etc.. so I don't really need
libxml to verify that yes it's valid according to the dtd) I can live with it.

However the reference entities seem to be skipped over completly. For
example for

<para>Foo &agrave; bar</para>

I get simple two text node with "Foo " and " bar" as content under the para node.

Is there some documentation that explains how API is supposed to
work together? The doc on the website leaves me a little frustrated because it
is either to superficial or to incoherent.

Character entities get through but are converted to utf-8 encoded. Although
not critical, I'd much rather have them as character entities in the character
nodes.

One more thing : could you please make it more clear on the website which
parts of the API are in the latest release and which only in CVS? I spent quite
some time hunting for xmlReadFile before reading in the mailing list archive that
it's a proposal for a new API...

I still like libxml, though. Thanks for taking time and care!

cu bart

-- 
http://www.irule.be/bvh/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]