Re: [xslt] xsl:output/@encoding may produce character references in element and attribute names



On Fri, Jul 04, 2008 at 12:57:40PM +0200, Michael Ludwig wrote:
> I stumbled upon an oddity in LibXSLT: Element and attribute names end up
> containing character references in the output when the characters are
> not available in the selected output encoding.
> 
> http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200807/msg00057.html
> 
> This oddity is actually a bug, so I reported it here:
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=541529

  You ask for something impossible. You get a non-xml document instead of
getting an immediate failure.
  It's a trade-off, unrelated to libxslt, it's actually in libxml2.
The transcoding is done on a preserialized UTF-8 document (or document
fragment), detecting the error means each time a character is not serializable
in the target encoding, when issuing the escaped sequence to do a rewind lookup
and try to guess (it's guessing because at that point you're manipulating
strings there is no notion of document structure) if you're within 
markup or within content.
  Basically it makes everybody pay a rather hight cost for the few who asked
for something impossible.
  The current state is there since the beginning of libxml2 (nearly a decade)
so the bug is extremely uncommon. This makes me even less comfortable with
the expansion of the cost. Again, it's a trade-off, a concious one, for 
more informations see libxml2 encoding.c around line 2057 that's where the
escaping is done. If you see another way to handle this not penalizing
heavilly the normal process, I'm all for fixing this. But right now I
don't see a solution.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]