Re: [xslt] Embedded Stylesheets



On Aug 14, 2004, at 6:26 AM, Daniel Veillard wrote:
You need to pass XML_PARSE_NOCDATA | XML_PARSE_DTDATTR | XML_PARSE_NOENT
as the parser options for the XSLT and the XML to be sure to have a compliant
XPath data model representation in the tree.


I can't make sense of that code and the error is there. Why use a
Push parse when you have all data already in memory ?

I copied it from our XML+CSS code, which parses incrementally. I can obviously simplify the XSLT case, since that won't be incremental. I can change that.

That will be way easier to maintain IMO.

Yeah, I switched over to this for the stylesheet parsing and the source doc parsing. Much simpler.



Why try to fool
the parser about encoding when libxml2 does implement the encoding
detection specified in appendix F of the XML specification ? Also
xmlCreatePushParserCtxt has encoding detection, you're redoing in
a likely untested way what libxml2 does reliably for ages. By doing
a forced cast to UTF16 you're breaking the encoding detection,
you're breaking performances, and you're likely to also break
conformance
of the parser. Do not force a cast to UTF-16, it's really really bad !
Beware too of the decoupling from the HTTP engine and the XML parser,
you must read http://www.w3.org/TR/REC-xml/#sec-guessing and RFC 3023
you will have to pass the encoding as declared in the Content-Type
HTTP header.

I didn't write this particular code, but I believe that it was added to
fix a bug where XHTML with a BOM was not rendering correctly. I'll try
to get you more details so that we can figure out what's going on.

libxml2 will use and detect the BOM if present. But when you create
the Push parser context you should pass down the 4 first bytes of the
entity. Again this is all related to appendix F in the XML Rec. This is
tricky to fully get right, and I think bypassing the libxml2 code which
implement as exactly as possible that part of the spec is likely to break
the parser conformance.



Ok, looked into this further, and the issue is that WebCore has already done the decoding into a UTF-16 buffer. So that's why we have to specify this explicitly. It turns out the bug was just further up in my code and I was passing in the wrong string. Everything works just fine now even with the encoding override.


dave



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]