mc and utf-8 again but different



Hello,

in April I announced, that I chose mc and UTF-8 as my bachelor's work. And now I will present my results.

I started with utf8-patch and tried add support for changing encoding in vfs. I added new prefix "#enc:" for do it. First I implemented this as a vfs_class, but there was problems with links. Then I edited directly mc_* function of vfs and it works.

I make decision, that will be nice, if whole mc works in utf-8 everywhere. Only one kind of functions will be needed. But I did not mind, that localization is not always in utf-8. I created slightly mad functions, that convert localization in utf-8. Now mc supported all localization (... that I tested). Only regular expressions was broken in non-utf8 encodings. I continued with editing view. In view I changed reading, displaying and caching functions. (http://www.fi.muni.cz/~xbenes5/projects/mc/mc-test.tar.gz, last version of utf-8 always version)

But when I swad my edits in mc, I changed my mind. I rejected utf-8 everywhere idea and checked out the newest version of mc. I designed api for strings (I assumed it before, so no big problem) and make variant for ascii, 8bits encodings and utf-8 (and possibly other encodings, that support backward reading). I imported good ideas from previous attempt and created final set of 30 patches. Each patch has small comment in mc-utf8.txt. Utf-8-patch don't occur in my pathces.

separately patches - http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8.tar.gz all together in one patch - http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8-all.tar.gz and applied to cvs version of mc - http://www.fi.muni.cz/~xbenes5/projects/mc/mc-complete.tar.gz

Problems:

invalid strings - I chose a defensive way, no invalid strings are loaded in mc. Only panels can handle invalid file names. I'm not sure, that I found every place, where invalid strings can appear. API functions like str_next_char, str_prev_char, str_length do not support invalid strings. Invalid strings support str_term_* function (formation for drawing on screen).

message_handler shall support multibytes characters. Now WInput must have self buffer for multibytes characters and it is not ideal. I don't modify message handler, because it is a huge change and is not needed, but it will be better. (Possibility of multibytes hot-keys, too, but I don't know, if someone will use this.)

I hope, that will work for someone, I'm trying used it instead of default distribution's mc now.

and now place for questions

Rostilav Beneš

(sorry about keying mistakes and etc, i should go sleep)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]