Character normalization ?



On Mon, Mar 25, 2002 at 03:50:39PM -0500, Gnome CVS User wrote:
> Log message:
> Mon Mar 25 15:46:54 2002  Owen Taylor  <otaylor redhat com>
> 
> * modules/basic/basic-*.c: Convert U+00A0 (NON BREAK SPACE)
> to U+0020 (SPACE)

  Hum, by the way, now that we have a decent internationalized
framework, one of the annoyances of Unicode is character normalization,
i.e. remapping sometimes sequences of Unicode chars to a single one.
The I18N working group at W3C is pushing hard for "early" normalization
[1] i.e. make sure that most of the APIs see only Normalized Content. 
  Can you tell me/us a bit on this issue ? Is there anything in place,
should we make any decision about this ? This can affect a number of things
like string searches and compare which otherwise are real pain.

Daniel

[1] http://www.w3.org/TR/charmod/#IDAAC0R

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]