Re: [xml] sax and entities



Daniel, All,
before it's forgotten, does anyone have some clues about this, 
please? Shall I buzilla it?
Thanks,
-- Petr

On Sunday 10 June 2007 23:10, Petr Pajas wrote:
Hi,

I have two files (also attached)

1) test.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE a [
  <!ENTITY b SYSTEM "b.txt">
]>
<a>&b;</a>

2) b.txt, which contains just "B"

When parsing test.xml via the SAX2 interface, I get two character
callbacks for the string "B". The problem can be reproduced with
testSAX --noent from the libxml2 distribution:

$ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.internalSubset(a, , )
SAX.entityDecl(b, 2, (null), b.txt, (null))
SAX.externalSubset(a, , )
SAX.startElement(a)
SAX.getEntity(b)
SAX.characters(B, 1)
SAX.characters(B, 1)  <--- why?
SAX.endElement(a)
SAX.endDocument()

(similarly if b.txt is complex XML - I get the same callbacks for
nodes in the entity twice)

Is this an expected behavior? If yes, can I somehow distinguish
between the two calls (e.g. based on ctxt) so that I can filter
one of them out?

P.S. this was observed by one of the users of the Perl bindings
for libxml2. We also have interface for libxml2's reader API in
Perl too, but there are hundreds of very popular Perl modules
build upon the SAX interface (mainly because Perl has really
advanced sax filtering and pipelining with interchangeable SAX
implementations varying from pure-perl, expat, to libxml2;
libxml2 is the fastest among them which makes it very popular and
thus worth maintaining).

Thanks in advance,
-- Petr



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]