Re: utf8 patch for mc, slang 2 version

On Tue, Sep 20, 2005 at 10:11:28PM +0200, Bálint Kardos wrote:

> But even with all patches and stuff, I see the following Unicode glitches:
> - the utf-8 chars are not diplayed in the dir list (on Ubuntu, everything is OK)
> for ÉÁŰŐÚÖÜÓ I see EAUOUOUO (upper, lowercase all wrong)
> - the files/dirs that contain the unicode chars, are still not
> properly aligned to the grids
> What could cause Darwin to behave such unpredictably?
> In the filesystem, there's another error:
> if you do 'ls', the alignment of the columns after the unicode chars
> are broken as well.

Unices use NFC, while MacOS uses NFD representation of accents (at least for
filenames, I don't know how about file contents). NFC means each accented
character has its own "composed" value, that is, one Unicode entity, which
is usually stored as two (maybe three) bytes in UTF-8. NFD composes the
characters from two Unicode entities, first the unaccented letter, followed
by an accent on its own. Its UTF-8 representation hence takes three bytes
(one for the unaccented letter and two more for the accent).

There are different levels of Unicode specified, I guess supporting NFD
requires a higher level of conformance since it's a harder job than
supporting NFC. I bet mc's UTF-8 patch only supports NFC.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]