[xml] setting the default charset ?


  While doing the translation of dia from libxml1 to libxml2, I stumbled
upon a problem for which I haven't found a solution in the fine manual:

        all previously saved files (through libxml1) are saved in "local"
charset encoding (8859-1, KOI-8, etc. you name it), but *without* the
encoding="..." attribute of the <?xml?> element. Yes, this is incorrect, but
users now have a sizable body of such (incorrectly encoded) files.

        When libxml2 doesn't see the encoding="..." attribute, it defaults
to either UTF-8 or ASCII-7 (I don't remember which one), which in either
case means trouble reading back the previously saved files. Problem: I'd
like to alter libxml2's behaviour (in a non-forking way, of course), to tell
it which charset to use in case none is specified in the XML header. I
haven't found how to do this cleanly and properly.

(I've got an ugly alternative: pre-read all XML files, search for the
encoding="..." attribute. If found, let libxml2 handle the file, if not,
first iconv() the file from local charset to UTF-8, then add an
encoding="UTF-8" attribute, finally let libxml2 handle the file. I find this
a bit intrusive, unfortunately).

        Could an expert enlighten me ?

        Thanks in advance.

                -- Cyrille

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]