Re: [xml] Question about using SHIFT-JIS encoding with libxml2



On Mon, Apr 16, 2007 at 12:55:33PM +0530, Agarwal, Saumya wrote:
 
Hi Daniel,
This is because libxml2 sees 'UTF-8' encoding in the XML declaration
while the data is not in UTF-8.
I tested by hardcoding the encoding received by libxml2 to 'SJIS' and it
behaved properly.
Hardcoding the encoding to SJIS is not an option for me as I want to
support both UTF-8 and SJIS.

  As far as I can tell, this is a fatal error, if you don't know the encoding
and alsway label as UTF-8 , any XML parser *MUST* reject it as not well
formed. It's what the spec says
   http://www.w3.org/TR/REC-xml/#NT-EncodingDecl

 "In the absence of information provided by an external transport
  protocol (e.g. HTTP or MIME), it is a fatal error for an entity
  including an encoding declaration to be presented to the XML processor
  in an encoding other than that named in the declaration"

if you have that extra encoding information, pass it to xmlRead... as 
stated previously. If you don't have it, what you're receiving is simply
not XML, no XML parser will be abble to handle it.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]