Re: [Rhythmbox-devel] UTF-8 issues still present
- From: Bastien Nocera <hadess hadess net>
- To: Chris Petersen <lists forevermore net>
- Cc: Rhythmbox Dev <rhythmbox-devel gnome org>
- Subject: Re: [Rhythmbox-devel] UTF-8 issues still present
- Date: Sun, 11 Jan 2004 20:40:35 +0000
On Sun, 2004-01-11 at 20:14, Chris Petersen wrote:
> On Sun, 2004-01-11 at 11:06, Christophe Fergeau wrote:
> > > >
> > > > Yeah, I realized this much. Just used to applications being smart
> > > > enough to detect "bad" utf8 characters and convert them from latin1, or
> > > > "good" utf8 characters and not doing anything with them.
> > >
> > >
> > > sorry, but imho there is not "bad" latin1 character ... latin1 means
> > > iso-8859-1, and it defined 256 characters... so basically every byte
> > > array is a valid latin1 encoded string => it's impossible to correctly
> > > detect if it's NOT latin1 for 100%. there can be some heuristics, but
> > > that's all.
> > >
> >
> > Things are generally done the other way round, and I think that's what
> > Chris meant: check if the input string is valid UTF-8, if it's not,
> > assume it's ISO8859-1 (or encoded in the user locale, or whatever).
>
> That is correct. This is what I've found many apps do, and is what I do
> with my own utf8<->latin1 conversion scripts. Technically, "correct"
> utf8 text *could* also be correct latin1, but it's very unlikely. But
Any ascii string will be both correct UTF-8 and correct latin1.
> incorrect utf8 will almost always be one of the iso8859 (and most likely
> latin) variants.
>
> Anyway, from what I've seen, only the monkey media stuff seems to
> ACTUALLY support writing of the utf8 flag in id3v2.4 tags, and versions
Any decent id3 tag handler will be able to say whether a tag is in UTF-8
or another encoding. I used cantus2 myself for that.
> before that didn't officially support utf-8 encodings. So without perl
v1 tags don't have any encoding information, so they usually are in the
user locale's native encoding.
All the v2 versions have a flags that says it should be either UTF-8 or
ISO-8859-1 (search for Unicode in http://www.id3.org/id3v2-00.txt)
> or id3lib support, it looks like I should just go back to using latin1
> tags in my mp3 files (since ogg ones seem plenty happy as utf8.
Cheers
---
Bastien Nocera <hadess@hadess.net>
Remember the 3 golden rules: 1. It was like that when I got here. 2. I
didn't do it. 3. (To your Boss) I like your style.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]