Re: [Rhythmbox-devel] UTF-8 issues still present



On Sun, 2004-01-11 at 11:06, Christophe Fergeau wrote:
> > > 
> > > Yeah, I realized this much.  Just used to applications being smart
> > > enough to detect "bad" utf8 characters and convert them from latin1, or
> > > "good" utf8 characters and not doing anything with them.
> > 
> > 
> > sorry, but imho there is not "bad" latin1 character ... latin1 means
> > iso-8859-1, and it defined 256 characters... so basically every byte
> > array is a valid latin1 encoded string => it's impossible to correctly
> > detect if it's NOT latin1 for 100%. there can be some heuristics, but
> > that's all.
> > 
> 
> Things are generally done the other way round, and I think that's what
> Chris meant: check if the input string is valid UTF-8, if it's not,
> assume it's ISO8859-1 (or encoded in the user locale, or whatever).

That is correct.  This is what I've found many apps do, and is what I do
with my own utf8<->latin1 conversion scripts.  Technically, "correct"
utf8 text *could* also be correct latin1, but it's very unlikely.  But
incorrect utf8 will almost always be one of the iso8859 (and most likely
latin) variants.

Anyway, from what I've seen, only the monkey media stuff seems to
ACTUALLY support writing of the utf8 flag in id3v2.4 tags, and versions
before that didn't officially support utf-8 encodings.  So without perl
or id3lib support, it looks like I should just go back to using latin1
tags in my mp3 files (since ogg ones seem plenty happy as utf8.

-- 
Chris Petersen
Programmer / Web Designer
Silicon Mechanics:  http://www.siliconmechanics.com/
Blade Servers:      http://www.siliconmechanics.com/c292/blade-server.php
1U Servers:         http://www.siliconmechanics.com/c272/1u-server.php





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]