Re: [xml] What is the document's encoding?
- From: Daniel Veillard <veillard redhat com>
- To: Bill Moseley <moseley hank org>
- Cc: xml gnome org
- Subject: Re: [xml] What is the document's encoding?
- Date: Mon, 20 Dec 2004 13:17:33 -0500
On Mon, Dec 20, 2004 at 07:49:41AM -0800, Bill Moseley wrote:
I'm using SAX and parsing both xml and html docs. Can I find out
what encoding the original document was in (or was converted from)?
I can use ctxt->input->encoding to get the encoding specified in a
meta http-equiv, but what I want to know is what encoding libxml2
thought the document was in when it was parsed. And also what
encoding was assumed if a charset is not specified in the source doc.
Hum, complex. There is multiple sources for encoding informations.
The rules are not the same for XML or HTML. For XML document, it may
also depend what entity is being processed. ctxt->input->encoding
should be what libxml2 think the encoding is for that entity.
See appendix F pf the spec for more informations too
http://www.w3.org/TR/REC-xml/#sec-guessing
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]