Re: [xml] What is the document's encoding?

On Mon, Dec 20, 2004 at 07:49:41AM -0800, Bill Moseley wrote:
I'm using SAX and parsing both xml and html docs.  Can I find out
what encoding the original document was in (or was converted from)?

I can use ctxt->input->encoding to get the encoding specified in a
meta http-equiv, but what I want to know is what encoding libxml2
thought the document was in when it was parsed.  And also what
encoding was assumed if a charset is not specified in the source doc.

  Hum, complex. There is multiple sources for encoding informations.
The rules are not the same for XML or HTML. For XML document, it may
also depend what entity is being processed. ctxt->input->encoding
should be what libxml2 think the encoding is for that entity.
See appendix F pf the spec for more informations too


Daniel Veillard      | Red Hat Desktop team
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]