Re: Issues with Gtk2 dialogs and UTF8 data.



On Mon, Dec 20, 2010 at 1:06 PM, Tadej BorovÅak <tadeboro gmail com> wrote:
binmode $input, ':utf8';

That's crucial here. I suspect the following:

- your script has "use utf8;", hence all literal strings have the internal
  UTF-flag turned ON, hence using such string in a Gtk2::Label works
  as expected

- your file is indeed UTF-8 encoded, but you didn't open it in :utf8 mode
  as above; hence a string read from the file has the internal UTF-8 flag
  turned OFF (though it contains the "correct" byte squences)

- concatening a string with UTF-8 flag on with another with the flag off
  causes the non-UTF-8 string to be "upgraded to UTF-8" which means

  - its bytes are interpreted as being in Latin-1 (!) and all non-ASCII bytes
    are converted to corresponding UTF-8 multibyte sequences

  - the UTF-8 flag is turned on

- e.g. a UTF-8 encoded "LATIN SMALL LETTER U WITH DIAERESIS"
  (U+00FC)  will be interpreted as two Latin-1 chars 0xC3 and 0xBC
  which represent codepoints U+00C3 "LATIN CAPITAL LETTER A WITH TILDE"
  and U+00BC "VULGAR FRACTION ONE QUARTER", resp.

Cheers, Roderich



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]