mc and utf-8 again but different



Hello,

in April I announced, that I chose mc and UTF-8 as my bachelor's work. And now I will present my results.
I started with utf8-patch and tried add support for changing encoding in  
vfs. I added new prefix "#enc:" for do it. First I implemented this as a  
vfs_class, but there was problems with links. Then I edited directly mc_*  
function of vfs and it works.
I make decision, that will be nice, if whole mc works in utf-8 everywhere.  
Only one kind of functions will be needed. But I did not mind, that  
localization is not always in utf-8. I created slightly mad functions,  
that convert localization in utf-8. Now mc supported all localization (...  
that I tested). Only regular expressions was broken in non-utf8 encodings.  
I continued with editing view. In view I changed reading, displaying and  
caching functions.
(http://www.fi.muni.cz/~xbenes5/projects/mc/mc-test.tar.gz, last version  
of utf-8 always version)
But when I swad my edits in mc, I changed my mind. I rejected utf-8  
everywhere idea and checked out the newest version of mc. I designed api  
for strings (I assumed it before, so no big problem) and make variant for  
ascii, 8bits encodings and utf-8 (and possibly other encodings, that  
support backward reading). I imported good ideas from previous attempt and  
created final set of 30 patches. Each patch has small comment in  
mc-utf8.txt. Utf-8-patch don't occur in my pathces.
separately patches -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8.tar.gz
all together in one patch -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8-all.tar.gz
and applied to cvs version of mc -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-complete.tar.gz
Problems:

invalid strings - I chose a defensive way, no invalid strings are loaded in mc. Only panels can handle invalid file names. I'm not sure, that I found every place, where invalid strings can appear. API functions like str_next_char, str_prev_char, str_length do not support invalid strings. Invalid strings support str_term_* function (formation for drawing on screen).
message_handler shall support multibytes characters. Now WInput must have  
self buffer for multibytes characters and it is not ideal. I don't modify  
message handler, because it is a huge change and is not needed, but it  
will be better. (Possibility of multibytes hot-keys, too, but I don't  
know, if someone will use this.)
I hope, that will work for someone, I'm trying used it instead of default  
distribution's mc now.
and now place for questions

Rostilav Beneš

(sorry about keying mistakes and etc, i should go sleep)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]