Re: [xml] Still trying to handle entities using the SAX interface
- From: Fabrizio Ammollo <f ammollo reitek com>
- To: "Peter Jacobi" <pj walter-graphtek com>
- Cc: Daniel Veillard <veillard redhat com>, xml gnome org
- Subject: Re: [xml] Still trying to handle entities using the SAX interface
- Date: Thu, 19 Apr 2001 12:15:12 +0200
On Thursday 19 April 2001 10:53, you wrote:
Hi Daniel, Fabrizio, All,
Hello
I'm trying to dive into this issue a.s.a.p. (which may be not that soon),
as I assume my apps are affected too.
I don't think it would be wise to handle the SAX API as minor and less
important. I'm using libxml for all sort of XML handling, sometimes DOM
is the better fit and sometimes and wouldn't work at all (file size). And I
wouldn't like to use two different libs for SAX and DOM.
I agree with you, although I also understand and agree with Daniel's opinion:
its better to work and improve what is used by most of the users, instead of
working on something else which is greatly less used and that may cause
problems to the main users.
Even so, I'd really like to see the library working (ideally) equally well
with both the DOM and the SAX API, letting the users choose what's more
appropriate for them.
Daniel, I don't want to put more work on your shoulders, I pretty much
understand that I have to do something myself.
I can tell you what I have achieved with the very simple changes I have made
to parser.c: at least the normal DOM behaviour of the library seems (at least
by the included regression tests) not to be broken ; the SAX behaviour shows
one glitch which I don't know if I'll have the time/patience/will to examine,
also because I don't know if solved that one others will emerge or not.
This is the document :
<?xml version="1.0"?>
<!DOCTYPE body [
<!ENTITY xml "Extensible Markup Language">
]>
<body>
<Record>
text1
<Title id="x">
</Title>
text2 and text4
<Title id="&xml;">
&xml;
</Title>
text3
</Record>
</body>
If I choose to substitute entities, I obtain what follows (from the log of my
application) :
---
EntityDeclElement : name [xml] type [1] public ID [(null)] system ID [(null)]
content [Extensible Markup Language]
XML_INTERNAL_GENERAL_ENTITY
EntityElement (userdata: [80477cc] - xml): path:
startElement (body) - path: body
charElement (userdata: [80477cc] - len = 1) - path: body
startElement (Record) - path: body/Record
charElement (userdata: [80477cc] - len = 7) - path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [x]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ ]
charElement (userdata: [80477cc] - len = 17) - path: body/Record
EntityElement (userdata: [80477cc] - xml): path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [Extensible Markup Language]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
EntityElement (userdata: [80477cc] - xml): path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ Extensible Markup LanguageExtensible Markup
Language ]
charElement (userdata: [80477cc] - len = 7) - path: body/Record
endElement (Record) - path: body/Record
Current stack element value : [ text1 text2 and text4 text3 ]
charElement (userdata: [80477cc] - len = 1) - path: body
endElement (body) - path: body
---
What happens is that the entity into the element part is returned duplicated
to my characters callback, I think because of the calling of my EntityElement
callback when the entity is reached.
Instead, if I choose NOT to substitute entities, I obtain this :
---
EntityDeclElement : name [xml] type [1] public ID [(null)] system ID [(null)]
content [Extensible Markup Language]
XML_INTERNAL_GENERAL_ENTITY
EntityElement (userdata: [80477cc] - xml): path:
startElement (body) - path: body
charElement (userdata: [80477cc] - len = 1) - path: body
startElement (Record) - path: body/Record
charElement (userdata: [80477cc] - len = 7) - path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [x]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ ]
charElement (userdata: [80477cc] - len = 17) - path: body/Record
EntityElement (userdata: [80477cc] - xml): path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [&xml;]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
EntityElement (userdata: [80477cc] - xml): path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ Extensible Markup Language ]
charElement (userdata: [80477cc] - len = 7) - path: body/Record
endElement (Record) - path: body/Record
Current stack element value : [ text1 text2 and text4 text3 ]
charElement (userdata: [80477cc] - len = 1) - path: body
endElement (body) - path: body
---
What happens here is that the entity into the element part is substituted,
always because of some bad interaction with my callback.
--
Bye,
Fabrizio
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]