Re: [xml] Questions about libxml2's API...



Daniel Veillard wrote:
On Tue, Dec 04, 2007 at 06:19:53PM +0100, Roland Mainz wrote:
I am currently working on SAX/xmlSAXParseFile libxml2 bindings for
ksh93/kash and have a few questions about the API:
- Is there a way to provide a "default encoding setting" which should be
used if the document itself doesn't define a character set ?

  I don't understand the question. The XML standard defines how things
should be checked in the absence of informations, in the document. If you
provide that information it overrides the normal procxessing (and if you
guess wrong you get a fatal error).
More doc
  http://xmlsoft.org/encoding.html
  http://www.w3.org/TR/REC-xml/#sec-guessing

Erm... the issue is a bit POSIX shell specific. A POSIX shell always
operates on "characters boundaries in the current locale". If we would
allow something like
$ echo "<?xml version="1.0..." | xmlsaxparse myfunctions - # then the
input data will be in the current locale. Either we assume that only
ASCII data are passed to the SAX parser, the shell script code needs to
lookup the current locale's encoding and pass it with the XML document
or the "xmlsaxparse" shell builtin command needs to handle the situation
somehow (e.g. convert input data to the expencted encoding). 

- How can I turn-off the libxml2 feature that it resolves all entities
(e.g. how can I do my own entity resolving) ?

  By default in SAX mode if you don't ask for entity replacement I
think libxml let you provide it (see the entity callback). NOTE:
this is hairy, complex to get right in the general case, and one reason
I recomment to not use SAX at all.

What would you recomment to be used instead ?

- How can I abort a SAX parser run from within a callback function ?

  xmlStopParser()

Thanks! :-)

- Is there a way to get |xmlSAXParseFile()| to accept stdin as input to
allow it's use in pipe chains ?

  "-"

Somehow it seems it doesn't like the pipe input much... the libxml2 code
sometimes prints stuff like:
-- snip --
I/O error : Invalid seek
I/O error : Invalid seek
I/O error : Invalid seek
I/O error : Invalid seek
-- snip --

- Is |xmlSAXParseFile()| fully thread-safe (important since ksh93 will
get thread support soon) ?

doc here
  http://xmlsoft.org/threads.html

Thanks! :-)

- Is |xmlSAXParseFile()| re-entrant, e.g. can I call |xmlSAXParseFile()|
from within a callback during another |xmlSAXParseFile()| ? For example

  That should work the parsing contexts are different

Ok...

for a simple RSS browser I would have to call |xmlSAXParseFile()| to
decode the RSS stream and then |xmlSAXParseFile()| a 2nd time (from
within a callback) to decode the XHTML data (I've already tried but
something weired is going on somehow the '<' and '>' characters seem to
"disappear" from the charatcer data stream).

  No idea, sounds weird.

Yes... even more weired the error sometimes disappears and then comes
lack - sounds like a job for "dbx -check access" or "valgrind" ...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland mainz nrubsig org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]