Re: [xml] Questions about libxml2's API...



On Wed, Dec 05, 2007 at 02:43:04AM +0100, Roland Mainz wrote:
Daniel Veillard wrote:
On Tue, Dec 04, 2007 at 06:19:53PM +0100, Roland Mainz wrote:
I am currently working on SAX/xmlSAXParseFile libxml2 bindings for
ksh93/kash and have a few questions about the API:
- Is there a way to provide a "default encoding setting" which should be
used if the document itself doesn't define a character set ?

  I don't understand the question. The XML standard defines how things
should be checked in the absence of informations, in the document. If you
provide that information it overrides the normal procxessing (and if you
guess wrong you get a fatal error).
More doc
  http://xmlsoft.org/encoding.html
  http://www.w3.org/TR/REC-xml/#sec-guessing

Erm... the issue is a bit POSIX shell specific. A POSIX shell always
operates on "characters boundaries in the current locale". If we would
allow something like
$ echo "<?xml version="1.0..." | xmlsaxparse myfunctions - # then the
input data will be in the current locale. Either we assume that only

  and if it does 
    cat foo.xml | 
you won't be able to assume anything.

ASCII data are passed to the SAX parser, the shell script code needs to

  that's just wrong.

lookup the current locale's encoding and pass it with the XML document
or the "xmlsaxparse" shell builtin command needs to handle the situation
somehow (e.g. convert input data to the expencted encoding). 

  Sorry no answer for that, best is to have the encoding in the XML declaration
describe the actual encoding, any guessing done is just calling for troubles.

- How can I turn-off the libxml2 feature that it resolves all entities
(e.g. how can I do my own entity resolving) ?

  By default in SAX mode if you don't ask for entity replacement I
think libxml let you provide it (see the entity callback). NOTE:
this is hairy, complex to get right in the general case, and one reason
I recomment to not use SAX at all.

What would you recomment to be used instead ?

  the Reader API
    http://xmlsoft.org/xmlreader.html
I have no idea what you are trying to achieve though.

- How can I abort a SAX parser run from within a callback function ?

  xmlStopParser()

Thanks! :-)

- Is there a way to get |xmlSAXParseFile()| to accept stdin as input to
allow it's use in pipe chains ?

  "-"

Somehow it seems it doesn't like the pipe input much... the libxml2 code
sometimes prints stuff like:
-- snip --
I/O error : Invalid seek
I/O error : Invalid seek
I/O error : Invalid seek
I/O error : Invalid seek
-- snip --

  No idea why, we have been using that for ages for various processing
    paphio:~/XML -> cat tst.xml | xmllint --noout  -
    paphio:~/XML -> 

First time I hear of a problem there.

for a simple RSS browser I would have to call |xmlSAXParseFile()| to
decode the RSS stream and then |xmlSAXParseFile()| a 2nd time (from
within a callback) to decode the XHTML data (I've already tried but
something weired is going on somehow the '<' and '>' characters seem to
"disappear" from the charatcer data stream).

  No idea, sounds weird.

Yes... even more weired the error sometimes disappears and then comes
lack - sounds like a job for "dbx -check access" or "valgrind" ...

  
Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]