[xml] i18n limitations in libxml2


After my comments and questions a few days ago about i18n limitations in
the current libxml2, it seems there are no takers for a discuss (or
simple answers) on the subject. Let me try a related issue.

I was going to create a patch to add a new function to libxml2, similar
to the current HTML push mode parser creation function, but accepting a
character set name. This would allow push mode to work properly with,
for example, Chinese HTML. Right now Chinese only works if there is a
language meta tag to select the encoding. There is a problem, though.
The meta tag processing code specifically ignores any language selection
if an explicit textual language selection has already been made for the
current document. Can someone explain the logic of this? The normal
behaviour of HTML parsers is to allow language meta tags to override the
initial language setting of the document, and any previous language meta
tags. The last meta tag, therefore, becomes the active one. The current
behaviour of libxml2 seems to achieve the opposite of general parser


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]