Re: [Rhythmbox-devel] UTF-8 issues still present

On Sun, 2004-01-11 at 20:14, Chris Petersen wrote:
> On Sun, 2004-01-11 at 11:06, Christophe Fergeau wrote:
> > > > 
> > > > Yeah, I realized this much.  Just used to applications being smart
> > > > enough to detect "bad" utf8 characters and convert them from latin1, or
> > > > "good" utf8 characters and not doing anything with them.
> > > 
> > > 
> > > sorry, but imho there is not "bad" latin1 character ... latin1 means
> > > iso-8859-1, and it defined 256 characters... so basically every byte
> > > array is a valid latin1 encoded string => it's impossible to correctly
> > > detect if it's NOT latin1 for 100%. there can be some heuristics, but
> > > that's all.
> > > 
> > 
> > Things are generally done the other way round, and I think that's what
> > Chris meant: check if the input string is valid UTF-8, if it's not,
> > assume it's ISO8859-1 (or encoded in the user locale, or whatever).
> That is correct.  This is what I've found many apps do, and is what I do
> with my own utf8<->latin1 conversion scripts.  Technically, "correct"
> utf8 text *could* also be correct latin1, but it's very unlikely.  But

Any ascii string will be both correct UTF-8 and correct latin1.

> incorrect utf8 will almost always be one of the iso8859 (and most likely
> latin) variants.
> Anyway, from what I've seen, only the monkey media stuff seems to
> ACTUALLY support writing of the utf8 flag in id3v2.4 tags, and versions

Any decent id3 tag handler will be able to say whether a tag is in UTF-8
or another encoding. I used cantus2 myself for that.

> before that didn't officially support utf-8 encodings.  So without perl

v1 tags don't have any encoding information, so they usually are in the
user locale's native encoding.
All the v2 versions have a flags that says it should be either UTF-8 or
ISO-8859-1 (search for Unicode in

> or id3lib support, it looks like I should just go back to using latin1
> tags in my mp3 files (since ogg ones seem plenty happy as utf8.


