Re: [xml] Ignoring Character Encodings



On Thu, Apr 11, 2002 at 12:30:44PM +0100, Richard Jinks wrote:
Hi

Ok, so I realise I maybe treading on dangerous ground here (some of the
posts in the archive about encodings get quite scary!), but I have a small
question regarding character encodings.

Currently, all documents opened by our application get converted to our own
standard internal encoding. When I pass the document through to libxml, I
map the doc from our encoding to UTF-8 and map the output from libxml back
again. Fine, no problem.

As is entirely expected (and indeed essential as far as normal XML parsing
is concerned), if the document itself contains an encoding declaration in
the <?xml...?> line, libxml wants to switch encoding to the one specified,
and not the UTF-8 I'm giving it.

As I know the doc has already been converted into UTF-8 before libxml gets
to see it (because I did it), is there any way of telling it to ignore the
encoding declaration contained in the doc, and to stick with the one I've
told it to use?

  I think removing the encoding information on the DOCUMENT top node
should be sufficient. Just replace it to NULL and free it.

I'm not supposed to modify the document in any way (i.e. to temporarily hide
the encoding declaration), but I could if I really had to - in reality, I'd
probably just change the encoding code in libxml by hand ending up with our
own 'doctored' version.
Also, it would be a waste of computing power to change the doc back into
it's real encoding only for libxml to change it straight back again!

   Yup, agreed.

If there isn't a proper way using the API that I've missed, I've got a patch
which just adds an extra run-time flag in xmlSetFeature(), and an if
statement in xmlSwitchEncoding() that will just break out of the function if
an encoding is already set (and the flag is set, of course).

  Depends on the serialization routine you're using, you did not say
what function you're using. There are specific function asking to save in
a given encoding.
    http://xmlsoft.org/encoding.html

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]