Re: [xml] Still trying to handle entities using the SAX interface



On Thursday 19 April 2001 10:53, you wrote:

Hi Daniel, Fabrizio, All,

Hello

I'm trying to dive into this issue a.s.a.p. (which may be not that soon),
as I assume my apps are affected too.

I don't think it would be wise to handle the SAX API as minor and less
important. I'm using libxml for all sort of XML handling, sometimes DOM
is the better fit and sometimes and wouldn't work at all (file size). And I
wouldn't like to use two different libs for SAX and DOM.

I agree with you, although I also understand and agree with Daniel's opinion: 
its better to work and improve what is used by most of the users, instead of 
working on something else which is greatly less used and that may cause 
problems to the main users.
Even so, I'd really like to see the library working (ideally) equally well 
with both the DOM and the SAX API, letting the users choose what's more 
appropriate for them.

Daniel, I don't want to put more work on your shoulders, I pretty much
understand that I have to do something myself.

I can tell you what I have achieved with the very simple changes I have made 
to parser.c: at least the normal DOM behaviour of the library seems (at least 
by the included regression tests) not to be broken ; the SAX behaviour shows 
one glitch which I don't know if I'll have the time/patience/will to examine, 
also because I don't know if solved that one others will emerge or not.

This is the document :

<?xml version="1.0"?>
<!DOCTYPE body [
<!ENTITY xml "Extensible Markup Language">
]>
<body>
<Record>
text1
<Title id="x">
</Title>
text2 and text4
<Title id="&xml;">
&xml;
</Title>
text3
</Record>
</body>

If I choose to substitute entities, I obtain what follows (from the log of my 
application) :

---
EntityDeclElement : name [xml] type [1] public ID [(null)] system ID [(null)] 
content [Extensible Markup Language]
XML_INTERNAL_GENERAL_ENTITY
EntityElement (userdata: [80477cc] - xml): path:
startElement (body) - path: body
charElement (userdata: [80477cc] - len = 1) - path: body
startElement (Record) - path: body/Record
charElement (userdata: [80477cc] - len = 7) - path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [x]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ ]
charElement (userdata: [80477cc] - len = 17) - path: body/Record
EntityElement (userdata: [80477cc] - xml): path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [Extensible Markup Language]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
EntityElement (userdata: [80477cc] - xml): path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ Extensible Markup LanguageExtensible Markup 
Language ]
charElement (userdata: [80477cc] - len = 7) - path: body/Record
endElement (Record) - path: body/Record
Current stack element value : [ text1  text2 and text4  text3 ]
charElement (userdata: [80477cc] - len = 1) - path: body
endElement (body) - path: body
---

What happens is that the entity into the element part is returned duplicated 
to my characters callback, I think because of the calling of my EntityElement 
callback when the entity is reached.

Instead, if I choose NOT to substitute entities, I obtain this :

---
EntityDeclElement : name [xml] type [1] public ID [(null)] system ID [(null)] 
content [Extensible Markup Language]
XML_INTERNAL_GENERAL_ENTITY
EntityElement (userdata: [80477cc] - xml): path:
startElement (body) - path: body
charElement (userdata: [80477cc] - len = 1) - path: body
startElement (Record) - path: body/Record
charElement (userdata: [80477cc] - len = 7) - path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [x]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ ]
charElement (userdata: [80477cc] - len = 17) - path: body/Record
EntityElement (userdata: [80477cc] - xml): path: body/Record
startElement (Title) - path: body/Record/Title
atts[0] : [id]
atts[1] : [&xml;]
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
EntityElement (userdata: [80477cc] - xml): path: body/Record/Title
charElement (userdata: [80477cc] - len = 26) - path: body/Record/Title
charElement (userdata: [80477cc] - len = 1) - path: body/Record/Title
endElement (Title) - path: body/Record/Title
Current stack element value : [ Extensible Markup Language ]
charElement (userdata: [80477cc] - len = 7) - path: body/Record
endElement (Record) - path: body/Record
Current stack element value : [ text1  text2 and text4  text3 ]
charElement (userdata: [80477cc] - len = 1) - path: body
endElement (body) - path: body
---

What happens here is that the entity into the element part is substituted, 
always because of some bad interaction with my callback.

--
Bye,
        Fabrizio




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]