Re: [Banshee-List] utf8 validation



On Sat, 2005-11-19 at 13:33 +0100, Martin Probst wrote:
> > I won't comment about the patch, but i think it's not a great idea to
> > support a broken mode for id3V1 tags. They were not meant to hold utf8
> > chars, only iso-8859-1, why not stick to the rule ?
> 
> Was that standardised? Anyway ASCII is a proper subset of UTF-8, and
> it's AFAIK very unlikely to mistake an ASCII based single byte charset
> for UTF-8 when it isn't. So checking if it is valid UTF-8 and using that
> if it fits will work for 90% of the tags as they are ASCII anyways, and
> I personally have not seen a single German ISO-8859-1(5) string that
> would also be valid UTF-8 if it's not ASCII anyways. 
> 
> Meaning I think that's a very valid choice, it will benefit UTF-8 users
> and probably only give a very minor performance hit for ISO-8859-1
> users. I like it :-)

Ok, i see in the v1 testsuite the following blurb:

> extra
>     Tests that test additional capabilities in the ID3 reader. The
>     charset of ID3 isn't formally defined, so both ISO-8859-1
>     capability as well as UTF-8 capability is tested. Also, some
>     readers detect URL:s in the comment field, so this is also tested.

However, most people and tagging library use iso-8859 which is thus a de-facto standard.

Will people who use iso-8859-1 with characters in the higher ASCII space
1xxxxxxx be affected by the UTF8 interpretation of the text in
iso-8859 ?

See also http://www.htmlhelp.com/reference/charset/latin1.gif for the
chars in latin1

Raf




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]