Re: CR/LF translation



Hello!

> I changed mc to support CR/LF translation, patch is attached.
> It is very useful for cygwin and also useful under unixes if it is
> necessary to edit files in dos encoding.
> There is some old code in mc for CR/LF translation, but it is slow and based
> on FILE stdio.h interface, whereas editor now uses  open/read/write
> interface.
> I wrote new code, which
> 1. reads file into buffer

This is unacceptable.  As of now, the editor never loads the whole file -
it uses buffers.  The viewer, however, tries to mmap() the whole file and
if mmap() fails, loads the whole file into the memory.

Loading the whole file is known to use too much resources.  Besides, this
imposes a 4 Gb limit on the file size on 32-bit systems.  Several people
complained about this limit in the viewer, so there will be complaints
about the editor.

Your code suggests that if sizeof(int) is 4 or more, any file can be
loaded.  That's not true.  Firstly, int is irrelevant here.  What's
relevant is pointer to char.  Nobody forces you to use int to index memory
- you can use long, which can always hold a pointer.

Secondly, even long is 32-bit on 32-bit platforms, which imposes a 4 Gb
limit on the files.  You can read any point in the file using off_t type,
which becomes a 64-bit integer if large file support is enabled.  
However, you cannot load the whole file and work with it in memory,
because you would have to use standard pointers.

> 4. status line now shows type of file

The type doesn't fit to the screen on 80x25 terminals.  Also, having
"UNIX" on screen almost all the time I'm editing something is not nice -
"UNIX" is a trademark.  "LF" would be better.

> 5. while saving file translate it back to dos , if it was dos There is
> an assumption that we are running on some decent 32 bit hardware because
> we are reading whole file into buffer for further translation. If 16 bit
> hardware detected, we just don't do translation.

Please find somewhere a definition of 32-bit hardware.

> The second part of the patch fixes problems with CR in case if we are
> not using translation or if we have CR embedded in text . It works like
> vi with dos files:  it shows CR as ^M. The original mc code handles CR
> very badly, it completely messes up files in dos encoding.

Could you please explain it in detail?  I'm not aware of any problems.  
Maybe this part can be applied separately?

> Third part of patch fixes problem with dos text files on volumes mounted
> in text mode under cygwin. This problem is actually very dangerous
> because mc corrupts files in this case. Solution is to open all files in
> binary mode and do CR/LF translation. There was one instance of opening
> file with some strange MY_O_TEXT flag, I had to remove it because
> otherwise it may conflict with O_BINARY flag.

Yes, MY_O_TEXT should be removed.

I think it should be possible to set binary more by default for the whole
process.  You can find this line in main.c, which is used for EMX
(actually, for RSX, because we are not supporting OS/2 anymore):

_fmode = O_BINARY;

Maybe there is something similar for Cygwin?

-- 
Regards,
Pavel Roskin




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]