Re: FISH randomly hangs in the middle of a big copy for sh://host/



Hi,

I reported the problem below 5 years ago, and it's still going on. It's
happening only rarely, and never happens if I use sshfs and copy from a
mountpoint via mc.
It also happens consistently on the same files and as far as I can tell,
in the same place (i.e. if a file hangs, I need to kill fish, make a new
connection, and if I copy it again, it hangs again in the same place)
The only change from below is that I now have a newer mc:
GNU Midnight Commander 4.8.17
Built with GLib 2.48.0
Using the S-Lang library with terminfo database
With builtin Editor
With subshell support as default
With support for background operations
With mouse support on xterm and Linux console
With support for X11 events
With internationalization support
With multiple codepages support
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ext2undelfs, ftpfs, sftpfs, fish
Data types: char: 8; int: 32; long: 32; void *: 32; size_t: 32; off_t: 64;

All I can tell is that it's not a random network problem because it
seems reproduceable on some specific files, but it's rare enough (maybe 1% of files or so).


Original Email below:

Could you tell me how I can best file this bug in a way that you can 
find out what's wrong?
This has happened with mc over multiple versions and years, currently
I have 4.8.12. Only interesting thing is linux/64bit with 32bit userland.

Maybe one file out of 20, mostly big files (over a gigabyte), copying 
from sh://host/path/file hangs.

On the other side, if I strace the perl code launched via FISH, I see
data flowing from the file to the pipe, even 2h after the UI has hung on
the client making the copy.
read(3, "\356\'\r#mU\304\v((\320\324`\337\311\35\332\350\331\177"..., 8192) = 8192
write(1, "\1\305*\213\220\244z\371\370\2~\340\277\231/\364\235e!"..., 8192) = 8192
read(3, "\27\202S\35E\241G\302S\302\fj\37v0K\306z\276\260B\350\263"..., 8192) = 8192
write(1, "}\350oR\262\212j\\\277\313\1\177\254\36\213\237-\227\21"..., 8192) = 8192
read(3, "\344\261\272\32\35\2675\5\366\326c$-\'\305\313V\3770\313"..., 8192) = 8192

I'm not sure where that data is going since pipes don't have unlimited
buffering.

On the client, I have this:
|-bash(15452)---mc(6558)-+-bash(6560)
|                        `-ssh(7305)

kill -STOP 7305 on the client causes the strace perl on the other machine 
to stop flowing.

I'm not too sure where that data is going, process 7305 says:
ssh     7305 root    0r  FIFO      0,8        63158081 pipe
ssh     7305 root    1w  FIFO      0,8        63158082 pipeg
ssh     7305 root    2w   CHR      1,3            1028 /dev/null
ssh     7305 root    3u  IPv4 63159122             TCP client:43163->server:ssh (ESTABLISHED)
ssh     7305 root    4r  FIFO      0,8        63158081 pipe
ssh     7305 root    5w  FIFO      0,8        63158082 pipe
ssh     7305 root    6w   CHR      1,3            1028 /dev/null

strace shows ssh is reading from FH 3, but I can't tell where it's
shoving the data:
read(3, "\0\0@\20\2365,\224\304hx$\312M\251\262\17\236D\352\354I\211\212\213+\253\3161\200@\\"..., 8192) = 
1448
clock_gettime(CLOCK_MONOTONIC, {548105, 51688985}) = 0
clock_gettime(CLOCK_MONOTONIC, {548105, 51787475}) = 0
select(7, [3 4], [], NULL, NULL)        = 1 (in [3])
clock_gettime(CLOCK_MONOTONIC, {548105, 51971095}) = 0
read(3, "\33{\303\3122\237!!\206\216u\321\275\265N\341\220\264\221G\6\266\227 
\314\212\377\r\371\177\247\36"..., 8192) = 1448
clock_gettime(CLOCK_MONOTONIC, {548105, 53168084}) = 0
clock_gettime(CLOCK_MONOTONIC, {548105, 53254841}) = 0
select(7, [3 4], [], NULL, NULL)        = 1 (in [3])
clock_gettime(CLOCK_MONOTONIC, {548105, 53439342}) = 0
read(3, "Bp=\304;WI\\x\271\26Cy\214\245\330\336*\270\35\3507\225pw\226\316\225\220\322\300\241"..., 8192) = 
1448
clock_gettime(CLOCK_MONOTONIC, {548105, 53637911}) = 0
clock_gettime(CLOCK_MONOTONIC, {548105, 53723783}) = 0

Now, the parent process, mc, pid 6558, seems to be receiving data.
If I kill -STOP 6558, I can see the data flow stop.

strace of that pid shows only reading from a pipe, only one character at
a time?
read(10, "+", 1)                        = 1
read(10, "\316", 1)                     = 1
read(10, "\222", 1)                     = 1
read(10, "\253", 1)                     = 1
read(10, "\216", 1)                     = 1
read(10, "\317", 1)                     = 1
read(10, "\316", 1)                     = 1

I can't see where mc is shoving that data, the UI is not showing any
progress

The destination file shown in lsof shows no progress/size change:
mc      6558 root    8w   REG   0,42 1186279729    25828 /mnt/dshelf1/file

So I have no idea where mc is putting that data if it is flowing in
(albeit one character at a time?) and not flowing out according to lsof.

Can you suggest what other data I should gather to help file a that's
actionable?

gargamel:~# mc --version
GNU Midnight Commander 4.8.12
Built with GLib 2.38.2
Using the S-Lang library with terminfo database
With builtin Editor
With subshell support as default
With support for background operations
With mouse support on xterm and Linux console
With support for X11 events
With internationalization support
With multiple codepages support
Virtual File Systems: cpiofs, tarfs, sfs, extfs, ext2undelfs, ftpfs, sftpfs, fish
Data types: char: 8; int: 32; long: 32; void *: 32; size_t: 32; off_t: 64;

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]