[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Possible bug with Byte Order Marks
- From: Daniel Veillard <veillard redhat com>
- To: Mark Itzcovitz <mark itzcovitz ntlworld com>
- Cc: xml gnome org
- Subject: Re: [xml] Possible bug with Byte Order Marks
- Date: Tue, 3 Jun 2003 10:16:48 -0400
On Tue, Jun 03, 2003 at 03:04:13PM +0100, Mark Itzcovitz wrote:
> On reflection, I think I'm wrong in what I say above. RFC2781 is about MIME
> types - the xml spec seems to say that the encoding declaration should just
> say UTF-16 and is used in conjunction with the BOM.
okay,
> > That should work, with the caveat you saw. I'm a bit concerned about
> > the requirement to add a field to record the encoding, this should laready
> > be stored somewhere on the context or in the inputStream block.
>
> The encoding from the encoding declaration is stored, but I can't see where
> the encoding derived from xmlDetectCharEncoding is stored.
okay, now the problem is that it's not a parser information but an
entity information, so ideally this should be saved in the input structure
block. But I think the encoding="" value is always finer grained than
the result of xmlDetectCharEncoding except in that case of UTF-16,
> > Seems one way, the other way would be in case of just "UTF-16" being
> > passed
> > to actually serialize a BOM on output to keep something similar, except
> > we would always dump big endian.
> > Either solution should work, the second one is slightly more
> > conservative.
> >
>
> Returning to my original query, which was that xmlDocDumpMemory and
> xmlDocFormatDump don't work correctly for "UTF-16", and having looked more
> closely at the code for those functions, I think that my proposed changes
> have too broad a scope. I can see a different solution that can easily be
> applied to those two functions but I am confused by what seems to me to be
> an inconsistency, as follows:
>
> A call to xmlFindCharEncodingHandler for "UTF-16" fails.
> A call to xmlParseCharEncoding for "UTF-16" followed by a call to
> xmlGetCharEncodingHandler returns the handler for XML_CHAR_ENCODING_UTF16LE.
the problem is that you add some state information, if you can keep this in
the local variables of the serialization routine then that's fine.
> The two Dump functions call xmlParseCharEncoding followed by
> xmlFindCharEncodingHandler. I propose putting a call to
> xmlGetCharEncodingHandler (using the result from the call to
> xmlParseCharEncoding), and only calling Find if the Get fails. This is
> hopefully a safe change.
Hum, sounds better, could you give a patch ?
thanks,
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]