Re[3]: [PATCH] edit.c, init_dynamic_edit_buffers (was Re[2]: CR/LF translation)



I want to add more on this..

PR> Looking at the size of the new code, I start wondering if it's really
PR> worth the trouble to write all that just to skip CR characters from input.  
PR> Do you expect any other positive effects from your patch other than CRLF
PR> suppport?

This as I said in my previous message can be resolved to some degree.

There are some other interesting points that need attention though -
the most important is the last one.

1. Scanninng every chunk for '\r' and '\n' even if the file is with UNIX line
endings.

   I think that this is not such a big deal though. We can control it through
   some flag which enables the translation. I had a comment in the source
   that I should think about it.

2. The current code, when rearranging the edit->buffers2 arrays, moves
the contents of all members aroung.

This one can be a real showstopper - I imagine doing a second move-around
of a 4 GB file can cause a lot of slowdown and it is really unnecessary.
My current implementation needs ajusting for sure - there is a comment
in the code that this approach is _UGLY_ ( one of your favourite :) ).
There is no need to move the contents of all the buffers but only the first
one and the last one..

..and I have just made the necessary changes in my local copy and it
works fine. :)

3. The overhead added by the new code when it translates a file

I can do some testing and see what kind of slowdown should be expected.
For sure there will be a slowdown :(

I tried to make the code efficient i.e. using (optimized) library routines
to move data in large chunks vs simple character copying, but you have to
keep in mind that it needs to handle more than just simple translation. It
must take care of the fact that it is processing data in chunks and that it
is changing the input data. Here is what it does more in addition to translation:

 o handles input from multiple buffers as it is one
 
 o keeps track of the free space in the current output buffer
 
 o switches the ouput buffer to which the translated data is stored when
   there is no more free space

 o keep tracks of the number of translated buffers

So, yes, there is code and ,yes, there is overhead.

I haven't investigated if there is a another place in the code, which is
better suited to host translation code, because it seems that the
overhead will be smaller if the data is translated on input once and
for all.

I don't have a strong argument for including this patch anymore, having
said that. I only provided it and I'm going to support it if included
- but I can't make a decision. If it is a frequent task for the users of
the MC editor to edit multi[mega|giga]byte files I would even vote against
that patch.

It should be useful as part of a better support for the DOS/UNIX line
edings issue. And I am not talking just for Cygwin here - I don't see
this a Cygwin only code.

Thanks! :)

Pavel Tsekov





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]