[Evolution-hackers] RH 7.3 filter / threading bug

From: Michael Meeks <michael ximian com>
To: notzed ximian com
Cc: evolution <evolution-hackers ximian com>, JP Rosevear <jpr ximian com>, Dan Winship <danw ximian com>
Subject: [Evolution-hackers] RH 7.3 filter / threading bug
Date: Wed, 26 Nov 2003 05:26:30 -0800

Hi there,

	So - since Mark could repeat this easily, I spent a little while
looking at it, and stracing the output; with Dan's help we hacked up a
solution, the bug is pretty simple in fact:

	During the thread helper setup we do:

pipe([26, 27])        = 0
clone(child_stack=0x8186b50, 
      flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND) = 7240

	Then whenever various thread ops happen we get:

[pid  7231] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  7231] write(27, "\240\300\352 \0\0\0\0\0\0\0\0\   ", 148) = 148
[pid  7231] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  7231] rt_sigsuspend([] <unfinished ...>

	And the thread blocks until we get a:

[pid  7240] kill(7231, SIGRTMIN <unfinished ...>
[pid  7231]  --- SIGRTMIN (Real-time signal 0) ---
[pid  7231] <... rt_sigsuspend resumed> ) = -1 EINTR 
[pid  7231] ...

	Which then continues processing. Which works just fine. Then we come to
this code fragment camel/camel-filter-driver.c (pipe_to_system):

	if (!(pid = fork ())) {
		/* child process */
...
		maxfd = sysconf (_SC_OPEN_MAX);
		if (maxfd > 0) {
			for (fd = 0; fd < maxfd; fd++) {
				if (<<fd != STDSOMETHING>>)
					close (fd);
			}
		}
...
		execvp (argv[0]->value.string, (char **) args->pdata);
		
	By the time we hit the execvp the pipe we need to talk to the main
thread worker has been closed; and we do a write to the pipe:

[pid  7252] close(1022) = -1 EBADF
[pid  7252] close(1023) = -1 EBADF
[pid  7252] write(27, "\340K\223C\2\0\0\...", 148) = -1 EBADF
[pid  7252] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  7252] rt_sigsuspend([] <unfinished ...>

	And of course - since we closed socket 27 the worker thread never gets
woken up to do its stuff / send us a signal to wake us up :-) so we hang
forever.

	So - the obvious fix (which we tested and works fine) is to do:

@@ -695,7 +695,7 @@
 		if (maxfd > 0) {
 			for (fd = 0; fd < maxfd; fd++) {
 				if (fd != STDIN_FILENO && fd != STDOUT_FILENO && fd !=
STDERR_FILENO)
-					close (fd);
+					fcntl (fd, F_SETFD, FD_CLOEXEC);
 			}
 		}
 		
	Which makes life good. My feeling is this bug should persist on systems
without the new futex stuff which bins the worker thread ie. RH 8 too
(but perhaps by some fluke we escape it).

	I believe - that it's a trendy thing to bash-glib, so I had a dig in
there, and would you believe it g_spawn does:

      open_max = sysconf (_SC_OPEN_MAX);
      for (i = 3; i < open_max; i++)
        set_cloexec (i);

	ie. when we switch to g_spawn there won't be a problem ;-)

	A quick grep for _SC_OPEN_MAX reveals that the same problem code is cut
and pasted [ URGH - why ? ] in HEAD and stable in a frightening number
of places; at first glance every one of them does it wrong:

./mail/em-junk-filter.c:108
./mail/mail-config.c:1426
./camel/camel-filter-driver.c:694
./camel/camel-filter-search.c:532
./camel/camel-stream-process.c:206
./camel/camel-process.c:81
./camel/camel-gpg-context.c:583

	It would be good to fix those.

	In addition, examining the code [ and being no expert on how the camel
stream stuff works ] it seems that in camel-filter-driver.c
(pipe_to_system) we do effectively:

	... fork-helper ...
	... synchronous blocking write of whole file ...
	... flush ...
	... synchronous blocking read of reply ...

	This approach is (naturally) doomed to failure for anything that
returns more than a socket-buffer's worth of data, and takes more than a
socket buffer's worth - since it will block on output while we're still
trying to write to it: is that a bug ? [ I appreciate perhaps for
several cases this is ok - since we're just looking for a status report
or single line answer but ... ].

	I ask, since Mark's test case was to use 'cat' as the filter script -
and you can envisage cases that produce lots of random output I suppose.
Has that aspect also been cut / pasted ?

	Anyhow, it was an 'interesting' bug to have to untangle (with Dan's
help) :-) the thread helper thing really sucks; and I guess none of use
have RH 7.3 systems handy to test on these days.

	HTH,

		Michael.

-- 
 michael ximian com  <><, Pseudo Engineer, itinerant idiot

Follow-Ups:
- Re: [Evolution-hackers] RH 7.3 filter / threading bug
  - From: Not Zed

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]