Re: [xml] sax and entities



On Sat, Jun 16, 2007 at 08:48:02AM +0200, Petr Pajas wrote:
Daniel, All,
before it's forgotten, does anyone have some clues about this, 

  You must implement an entity handler as part of the SAX callback
block which is compatible with libxml2 entities processing and your own
needs.

please? Shall I buzilla it?
Thanks,
-- Petr

On Sunday 10 June 2007 23:10, Petr Pajas wrote:
Hi,

I have two files (also attached)

1) test.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE a [
  <!ENTITY b SYSTEM "b.txt">
]>
<a>&b;</a>

2) b.txt, which contains just "B"

When parsing test.xml via the SAX2 interface, I get two character
callbacks for the string "B". The problem can be reproduced with
testSAX --noent from the libxml2 distribution:

$ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.internalSubset(a, , )
SAX.entityDecl(b, 2, (null), b.txt, (null))
SAX.externalSubset(a, , )
SAX.startElement(a)
SAX.getEntity(b)
SAX.characters(B, 1)
SAX.characters(B, 1)  <--- why?

  One when parsing the entity to make sure it's well formed the first time
you use the entity.
  One each time the entity must be delivered to user land.

SAX.endElement(a)
SAX.endDocument()

(similarly if b.txt is complex XML - I get the same callbacks for
nodes in the entity twice)

Is this an expected behavior? If yes, can I somehow distinguish
between the two calls (e.g. based on ctxt) so that I can filter
one of them out?

P.S. this was observed by one of the users of the Perl bindings
for libxml2. We also have interface for libxml2's reader API in
Perl too, but there are hundreds of very popular Perl modules
build upon the SAX interface (mainly because Perl has really
advanced sax filtering and pipelining with interchangeable SAX
implementations varying from pure-perl, expat, to libxml2;
libxml2 is the fastest among them which makes it very popular and
thus worth maintaining).

  it's all dependant on how your entity handler is implemented I think.
It's very tricky, I agree, that's why I suggest to not use SAX in general.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]