Re: [Rhythmbox-devel] UTF-8 issues still present

> Any ascii string will be both correct UTF-8 and correct latin1.

Funny.  I've found all kinds of documentation about checking for
malformed utf8 strings..  just look at perl's Encode documentation.

> Any decent id3 tag handler will be able to say whether a tag is in UTF-8
> or another encoding. I used cantus2 myself for that.

Yes.  easytag can happily edit/display utf8 tags, as can my perl
scripts.  But without that utf8 flag in the 2.4 tag (which I can't seem
to find a way to set either with perl of with id3lib's API - the 2.4
spec doesn't say WHERE the flag goes, binarily speaking, within the tag
data, so I can't fix/update any of the perl libraries, either), players
still continue to display the tags as if they were raw ascii (or more
likely, latin1).

> v1 tags don't have any encoding information, so they usually are in the
> user locale's native encoding.
> All the v2 versions have a flags that says it should be either UTF-8 or
> ISO-8859-1 (search for Unicode in

Actually according to the document you pointed out, tags are latin1, not
"locale"..  if they happen to work for locale, it's nonstandard. 
Probably done intentionally to make diacritic characters portable across
different locale setups.

Unicode, yes, utf8, no.  the utf8 flag was only added to id3v2.4 -
versions 2.3 and earlier only support full unicode or latin1.

Anyway, this thread has gotten rather off-topic for this list.  I now
know why my utf8 tags aren't showing up properly in rhyhmbox - rhythmbox
is only checking the flag, not the string contents, which seems to be
according to id3v2.4 spec, albeit different from how other programs seem
to work.  So I'll leave it as it stands, and either continue looking for
ways to fix my tags (I'm open to ideas), or convert them back to latin1.

Chris Petersen
Programmer / Web Designer
Silicon Mechanics:
Blade Servers:
1U Servers:

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]