Hi list

To reignite this discussion now that I've finished my exams...

I posted this on Simos' blog a while back, but the discussion there
had died off it seems, so I'll repost here.

UTF-8 is designed so that subsequences are unambiguous. You won't get
a byte less than 0x80 in any part of a multi-byte sequence. bytes
0x00-0x7F map directly to 7-bit ASCII.

Some people are worried about string functions breaking. I really
don't see how this is the case, seeing as we're doing g_some_function
(_("Some ASCII string")) which is replaced with a UTF-8 string at
runtime anyway.

Does anyone have any actual proof of UTF-8 in our translatable strings
breaking C?

Somebody said that any byte with a the MSB set (i.e. 0x80-0xFF) will
cause some compilers to break. Is this true? Can they be fixed? And if
not, do we have to support them?

If we can come to an agreement, I will write a Live page giving
guidelines on how to use directional quotation marks, for those who
may be unfamiliar with typing them, etc.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]