Re: UTF-8



El mié, 10-07-2002 a las 12:28, Damien Donlon - Sun Ireland - Solaris
Software - Localisation Engineer escribió:
> [...]
> What to do ?
> _________________________
> 
> I think the following things ought to be done :
> 
> [1] Identify what are the usage limitations of UTF-8 for some translations
>     teams and identify how they can be eliminated ( the limitations not
>     the translation teams ;-) )

Agree.

> [2] Create a tool that can check whether a file is UTF-8 encoded.
>     The tool should not be dependent on simply reading a charset field
>     within the file to see whether it says UTF-8 but by analysing the
>     byte stream. Does such a tool exist already within the community?

file(1)

>     I think it may be impossible to distinguish between UTF-8 and 8859-1
>     if no character is outside the 0-127 range. Can anyone confirm? Is
>     this a big problem in identifying UTF-8 encoded files?

8859-1 use 0-255 range, AFAIK.

>     The tool would be provided to translation teams to check their files
>     prior to the cvs commit.

file(1) and the header check script, anything else, I guess.

-- 
German Poo Caaman~o
mailto:gpoo@ubiobio.cl
http://www.ubiobio.cl/~gpoo/chilelindo.html
«Hay 10 tipos de personas: las que entienden binario y las que no.»




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]