Re: [Rhythmbox-devel] UTF-8 issues still present



On Sun, 2004-01-11 at 02:57, Chris Petersen wrote:
> > This is characteristic of an UTF-8 which was considered as being
> > iso8859-1 encoded (in UTF-8, most 8 bit characters are coded on 2 bytes,
> > and in iso8859-1, 1 character is always 1 byte long).
> 
> Yeah, I realized this much.  Just used to applications being smart
> enough to detect "bad" utf8 characters and convert them from latin1, or
> "good" utf8 characters and not doing anything with them.


sorry, but imho there is not "bad" latin1 character ... latin1 means
iso-8859-1, and it defined 256 characters... so basically every byte
array is a valid latin1 encoded string => it's impossible to correctly
detect if it's NOT latin1 for 100%. there can be some heuristics, but
that's all.

bye,
gabor




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]