Re: [xml] What is the document's encoding?



On Mon, Dec 20, 2004 at 07:49:41AM -0800, Bill Moseley wrote:
I'm using SAX and parsing both xml and html docs.  Can I find out
what encoding the original document was in (or was converted from)?

I can use ctxt->input->encoding to get the encoding specified in a
meta http-equiv, but what I want to know is what encoding libxml2
thought the document was in when it was parsed.  And also what
encoding was assumed if a charset is not specified in the source doc.

  Hum, complex. There is multiple sources for encoding informations.
The rules are not the same for XML or HTML. For XML document, it may
also depend what entity is being processed. ctxt->input->encoding
should be what libxml2 think the encoding is for that entity.
See appendix F pf the spec for more informations too
   http://www.w3.org/TR/REC-xml/#sec-guessing

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]