'Re: [xml] "Control over encoding declaration (prolog and meta)'
- From: Kasimier Buchcik <kbuchcik 4commerce de>
- To: <xml gnome org>
- Subject: 'Re: [xml] "Control over encoding declaration (prolog and meta)'
- Date: Thu, 15 Jan 2004 13:19:13 +0100
on 1/15/2004 11:52 AM Daniel Veillard wrote:
On Thu, Jan 15, 2004 at 10:34:05AM +0100, Kasimier Buchcik wrote:
Would this approach be thread-save? I would expect this procedure to
temporarily change the encoding handler for e.g. ISO-8859-1 to UTF-16LE,
but if I'm serializing an other document to ISO-8859-1 at the same time,
I would get false results. Or is the registration of encoding handlers
somehow implemented per-thread like?
No it's global.
Ok, so I can't use it.
Lying about encodings is bad. I don't know why you want to do this,
but I don't want to start making specialized APIs for this reason.
:-) Lying about encodings: "Hey dude, you tricked me with that encoding.
It says ISO-stuff but it's UTF-stuff. Gimme back my bucks!"
Ok, this issue is DOM 3 related. As you might remember I'm still
struggeling with "to DOMString serialization" and "from DOMString
parsing", which has to be always UTF-16 encoded, regardless of the
content; so if I have e.g. an ISO-8859-1 document I still need it to be
serialized to UTF-16, but it still *has to* contain an encoding
declaration of ISO-8859-1. It sounds like no big deal, but if I don't
have control over both, the target encoding and the declared encoding, I
can't fullfill the requirements of the DOM 3 spec.
Encodings are registered globally, I think it's a sound decision, it's
a framework capacity and an API that I expect to be used once at startup.
Yes, I agree.
I think it's not an encoding issue, but rather: "let *me* decide how the
declaration goes, I'm big enough to decide if it's wrong or not".
If you have a completely broken requirement, fork, do the unclean stuff
in the forked process and be done with it. If there is a speed penalty,
then that will give people an incentive to fix the receiving side. Sorry
this is not a valuable reason to add even more confusing APIs, increase
libxml2 code and overall complexity.
I know that libxml2 has not much to do with DOM requirements. But I
would not call the implementation of a DOM 3 requirement "unclean stuff".
XHTML is XML, the tools MUST parse it following the XML rules which are
cristal clear, if your instance says "ISO-8859-1" and is encoded in
As stated above, XML spec on the one side, DOM spec on the other.
"UTF-16LE" then it's a well formedness error, unless you get something
like an HTTP header telling what the real encoding is (and I personally
consider this a terrible bad kludge, but that's how it is).
So the sum of use cases has risen to 2 :-)
Daniel, you wrote some of your mails on the list that there are too many
entrypoints to the library already; I understand your concern, and
things like the xmlReadxxx API with all the nice options are really
compact and concise. So I wonder if it would be good to have a
xmlSerializexxx API; a serialization context sounds a bit heavy, but
more flexible - allowing extensible options for the future. And I would
be happy about a field "declaredEncoding" taking a custom encoding to be
declared. I really think the serialization will become far more complex,
and should be more customizable, if (hopefully) libxml2 will try help
out more with DOM stuff in the future.
Finally I must admit that there would be a workaround for me: I could
serialize with the existing API, then encode to UTF-16LE. But since we
are using quite huge documents, I guess it will not acceptable in
matters of performance and seems rather stupid.
] [Thread Prev