wrong interpratation of UTF-8 character if on a 8000 boundary

From: ilias iliadis <simsonbike-bugs yahoo gr>
To: "mc-devel gnome org" <mc-devel gnome org>
Subject: wrong interpratation of UTF-8 character if on a 8000 boundary
Date: Mon, 25 Jul 2011 13:20:56 +0100 (BST)

First versions (from two different PCs)

mc -V
GNU Midnight Commander 4.7.0
Virtual File System: tarfs, extfs, cpiofs, ftpfs, fish
With builtin Editor
Using system-installed S-Lang library with terminfo database
With subshell support as default
With support for background operations
With mouse support on xterm and Linux console
With internationalization support
With multiple codepages support
Data types: char 8 int 32 long 64 void * 64 off_t 64 ecs_char 8


mc -V
GNU Midnight Commander 4.7.0.9
Virtual File System: tarfs, extfs, cpiofs, ftpfs, fish, undelfs
With builtin Editor
Using system-installed S-Lang library with terminfo database
With subshell support as default
Με υποστήριξη εργασιών παρασκηνίου
With mouse support on xterm and Linux console
Με υποστήριξη συμβάντων X11
With internationalization support
With multiple codepages support
Data types: char 8 int 32 long 32 void * 32 off_t 64 ecs_char 8

Problem:
In a text file if a UTF-8 character (bigger than FF, such as:GREEK SMALL LETTER TAU  0xCF 0x84) lays on a 0x8000 boundary is misinterpreted as two different characters and is displayed wrong in view mode (F3) as two points (..)
Attached is a "splited" file from el.wiktionary
You can see that there is a difference if you open attached unzipped txt file in gedit and in mc. In mc you see ".." instead of "τ" at position 0x8000
Thanks

Attachment: testmc.zip
Description: Zip archive

Follow-Ups:
- Re: wrong interpratation of UTF-8 character if on a 8000 boundary
  - From: Andrew Borodin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]