Re: [xslt] Embedded Stylesheets

From: Dave Hyatt <hyatt apple com>
To: veillard redhat com
Cc: The Gnome XSLT library mailing-list <xslt gnome org>
Subject: Re: [xslt] Embedded Stylesheets
Date: Sat, 14 Aug 2004 17:11:53 -0700

On Aug 14, 2004, at 6:26 AM, Daniel Veillard wrote:

You need to pass XML_PARSE_NOCDATA | XML_PARSE_DTDATTR | XML_PARSE_NOENT as the parser options for the XSLT and the XML to be sure to have a compliant XPath data model representation in the tree.
I can't make sense of that code and the error is there. Why use a
Push parse when you have all data already in memory ?
I copied it from our XML+CSS code, which parses incrementally.  I can
obviously simplify the XSLT case, since that won't be incremental.  I
can change that.
That will be way easier to maintain IMO.

Yeah, I switched over to this for the stylesheet parsing and the source doc parsing. Much simpler.

Why try to fool the parser about encoding when libxml2 does implement the encoding detection specified in appendix F of the XML specification ? Also xmlCreatePushParserCtxt has encoding detection, you're redoing in a likely untested way what libxml2 does reliably for ages. By doing a forced cast to UTF16 you're breaking the encoding detection, you're breaking performances, and you're likely to also break conformance of the parser. Do not force a cast to UTF-16, it's really really bad ! Beware too of the decoupling from the HTTP engine and the XML parser, you must read http://www.w3.org/TR/REC-xml/#sec-guessing and RFC 3023 you will have to pass the encoding as declared in the Content-Type HTTP header.
I didn't write this particular code, but I believe that it was added to fix a bug where XHTML with a BOM was not rendering correctly. I'll try to get you more details so that we can figure out what's going on.
libxml2 will use and detect the BOM if present. But when you create the Push parser context you should pass down the 4 first bytes of the entity. Again this is all related to appendix F in the XML Rec. This is tricky to fully get right, and I think bypassing the libxml2 code which implement as exactly as possible that part of the spec is likely to break the parser conformance.

Ok, looked into this further, and the issue is that WebCore has already done the decoding into a UTF-16 buffer. So that's why we have to specify this explicitly. It turns out the bug was just further up in my code and I was passing in the wrong string. Everything works just fine now even with the encoding override.

dave

Follow-Ups:
- Re: [xslt] Embedded Stylesheets
  - From: Daniel Veillard

References:
- [xslt] Embedded Stylesheets
  - From: David Hyatt
- Re: [xslt] Embedded Stylesheets
  - From: Daniel Veillard
- Re: [xslt] Embedded Stylesheets
  - From: Dave Hyatt
- Re: [xslt] Embedded Stylesheets
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]