Re: [xml] i18n limitations in libxml2



Le 08/06/01 04:18:05, Steve Underwood a écrit :
The meta tag processing code specifically ignores any language selection
if an explicit textual language selection has already been made for the
current document.

Just to avoid misunderstandings, we are talking about character encodings,
not languages.

Can someone explain the logic of this? The normal behaviour of HTML
parsers is to allow language meta tags to override the initial language
setting of the document,

Actually that's not true.
In <http://www.w3.org/TR/html4/charset.html> , 5.2.2 Specifying the
character encoding:
  «To sum up, conforming user agents must observe the following priorities
   when determining a document's character encoding (from highest priority
   to lowest):
      1. An HTTP "charset" parameter in a "Content-Type" field.
      2. A META declaration with "http-equiv" set to "Content-Type" and
         a value set for "charset".
      3. The charset attribute set on an element that designates an
         external resource.
   In addition to this list of priorities, the user agent may use
   heuristics and user settings. For example, many user agents use a
   heuristic to distinguish the various encodings used for Japanese text.
   Also, user agents typically have a user-definable, local default
   character encoding which they apply in the absence of other indicators.»

and any previous language meta tags.

This is not defined by HTML, and "correct" documents are unlikely to
contain multiple, conflicting META HTTP-EQUIV="Content-Type" elements.

The current behaviour of libxml2 seems to achieve the opposite of
general parser behaviour.

"General" parser behaviour and _conforming_ parser behaviour are not the
same.
If I understand correctly what you're talking about, libxml2 has a
comforming behaviour ; otherwise, just forget what I said...

Tom.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]