Re: GChildWatchSource take three



On Wed, 2003-07-30 at 14:16, Jonathan Blandford wrote:
> Hi Owen,
> 
> Here's the latest version of my patch.  I added more to the docs as well
> as fixing other things discussed with Alex.  The big question I have is
> how best to handle ECHILD situations.  Currently, I'm passing -1 as the
> status.  As an alternative, I can just not dispatch those sources as
> well.

Commented on below with a few other things; despite the length of
the comments, it looks basically pretty solid; most of this stuff
is details, and the stuff that isn't details shouldn't be hard
to fix.

Regards,
						Owen


=====
@@ -39,7 +39,9 @@
 #include "glib.h"
 #include "gthreadinit.h"
 #include <sys/types.h>
+#include <sys/wait.h>
 #include <time.h>
+#include <fcntl.h>
=====

All the signal/waipid() stuff likely needs to be #ifdef G_OS_UNIX
guarded. I don't really know how this stuff is going to play
out on other processes. g_spawn_* stuffs a handle into an int,
which I assume isnt' going to work on the x86_64 version 
of Windows...

What we might be able to do on Windows is make up our own 
local pids that are shared between this code and GSpawn*
and don't work on anything else.

=====
@@ -221,8 +239,22 @@ static gboolean g_idle_dispatch    (GSou
 				    gpointer     user_data);
 
 G_LOCK_DEFINE_STATIC (main_loop);
+G_LOCK_DEFINE_STATIC (main_context_list);
+
=====

I'd like to see the main_context_list lock defined right
next to main_context_list. Also, there needs to be a comments
about partial ordering between; the locks:

 main_loop
 main_context_list
 context->mutex

Something like:

 /* The main_loop lock cannot be acquired while either the
  * main_context_list lock is held, or the mutex for any
  * individual main loop is held.
  */

 /* The main_context_list lock cannot be acquired while
   * the mutex for any individual main loop is held.
  */

The second comment is in fact *not* satisfied right now, so
you can have the deadlock:

 Thread A)
  g_main_context_unref (context);
   Acquires context->mutex
   tries to acquire main context list lock to remove 
    itself

 Helper thread)
   tries to acquire main context list lock
   Tries to acquire context->mutex to wake the thread up

So, you need to move the removal from the main context
list lock *after* unlocking the context's lock, with
a comment like:

 /* The only other outstanding reference to the context
  * at this point is main context_list, so we can unlock
  * the context now. The main_context_list lock will
  * serialize the next step with child_watch_helper_thread().
  */
===
+GSourceFuncs g_child_watch_funcs =
+{
+  g_child_watch_prepare,
+  g_child_watch_check,
+  g_child_watch_dispatch,
+  NULL
+};
===

The reason that the other sets of functions were exported publically 
was for g_source_set_closure, see comment in the header file
and gobject/gsourceclosure.[ch]. So, either make it static
or (better) add support to gsourceclosure.

===
+/* called when initializing threads and the child_watch source */
+static void
+add_pipes_to_main_contexts (void)
====

", and when intializing the child_watch source when single
  threaded"

or something like that ... mentioning that we are single
threaded in both would be good.

====
+void
+_g_main_thread_init ()
+{
+  add_pipes_to_main_contexts ();
+}
====

Don't you need to do the SINGLE => THREADED initialization
here? If you don't do it here, then a child watched
added before g_thread_init() won't fire correctly 
after it.
 

====
@@ -2173,8 +2229,8 @@ g_main_context_check (GMainContext *cont
   if (!context->poll_waiting)
     {
 #ifndef G_OS_WIN32
-      gchar c;
-      read (context->wake_up_pipe[0], &c, 1);
+      gchar a;
+      read (context->wake_up_pipe[0], &a, 1);
 #endif
     }
   else
====

Stray change.

====
+   * the context in main_context_about_to_poll in the expectation that
the
+   * signal handler can deal with it.  We expect that no code in
prepare or
+   * query tries to send data to the wake up pipe, and that we can
safely write
+   * a byte down it. */
====

*/ goes on next line by GLib style.
	  

===
+      waitpid_status = waitpid (child_watch_source->pid, &child_status,
WNOHANG);
+      if (waitpid_status == child_watch_source->pid)
+	{
+	  child_watch_source->child_status = child_status;
+	  child_watch_source->child_exited = TRUE;
+	}
+      else if (waitpid_status == -1)
+	{
+	  /* errno can be EINVAL or ECHILD.  I'm not sure what EINVAL could
mean
+	   * here. */
+	  child_watch_source->child_status = -1;
+	  child_watch_source->child_exited = TRUE;
===

EINVAL means that the arguments you passed to waitpid() (the flags) 
are not valid. I don't think you should handle it other than
perhaps asserting that doesn't happen.

ECHILD indicates a programmatic error, and you should g_warning()
and probably avoid dispatching. Anyways, I believe a status of -1 
could be a legitimate exit status, so you can't use it as a flag value.

I can't find anything in POSIX that forbids a return of EINTR
from a waitpid() with WNOHANG though it would be strange to
have that, so you should probably do:

 do {
   waitpid_status = waitpid (child_watch_source->pid, &child_status,
WNOHANG);
 } while (waitpid_status == (pid_t)-1 && errno == EINTR);

===
+static gboolean
+g_child_watch_prepare (GSource *source,
+		       gint    *timeout)
+{
+  GChildWatchSource *child_watch_source;
+  *timeout = -1;
+
+  child_watch_source = (GChildWatchSource *) source;
+
+  check_for_child_exited (source);
+
+  return child_watch_source->child_exited;
===

This is sort of missing the point of why I suggested
adding ->child_exited, I was imagining:

 if (!child_watch_source->child_exited)
  check_for_child_exited (source);

 return child_watch_source->child_exited;

to avoid ever calling waitpid() on an exited source and 
getting ECHILD. But then again, you are right that once
a source returns TRUE from prepare() or check() we
set the G_SOURCE_READY and never call the functions
again until we've dispatched the source. So, we might
as well do:

 return check_for_child_exited (source);

as you had originally.

===
+static gboolean 
+g_child_watch_check (GSource  *source)
+{
+  GChildWatchSource *child_watch_source;
+
+  child_watch_source = (GChildWatchSource *) source;
+
+  return child_watch_source->child_exited;
+}
====

This should call check_for_child_exited (source), it's
almost free (because of the count) and there's no 
reason to wait until the next iteration of the loop before
noticing the child has exited.

===
+static void
+g_child_watch_signal_handler (int signum)
+{
[...]
+      if (main_context_about_to_poll)
+	{
+	  g_main_context_wakeup_unlocked (main_context_about_to_poll);
+	}
+    }
===

This doens't work on several aspects:

 A) g_main_context_wakeup_unlocked() is a noop
    ifndef G_THREADS_ENABLED
 
 B) g_main_context_wakeup_unlocked() checks
     if (g_thread_supported() && context->poll_waiting)
    so it's a no-op when you call it here in the
    single threaded case.

 C) g_main_context_wakeup_unlocked() isn't signal safe. In the
    sequence

===
ifdef G_THREADS_ENABLED
  if (!context->poll_waiting)         <= A
    {
#ifndef G_OS_WIN32
      gchar c;
      read (context->wake_up_pipe[0], &c, 1);
#endif
    }
  else
    context->poll_waiting = FALSE;   <= B
===
    
   if your signal handler runs between A and B, you
   leak a byte into the pipe, and your main loop will
   spin at 100% cpu because it gets left there. I think you 
   can fix this by doing:

  if (context->wake_up_rec.revents & G_IO_IN)
    {
      /* Read a bunch of bytes from the pipe */
    }
  context->poll_waiting = FALSE;

  And then use a custom wake up function in your
  signal handler that doesn't look at g_thread_supported().

===  
+static void
+g_child_watch_source_init_single (void)
+{
+  g_assert (! g_thread_supported());
====

Extra space after '!'

====
+      gchar b[20];
+      GSList *list;
+
+      read (child_watch_wake_up_pipe[0], b, 20);
====

sizeof(b) better than 20, and I'd like to see a comment explaining
that the 20 is just some arbitrary number, and why we 
are reading a bunch of bytes here. I'd probably
also use a slightly bigger value like 64 or 128. Thread stack
stack sizes can be small, but they aren't *that* small.
A power of 2 will make it clearer that it is just a random
number you picked. When you tell a hacker, "pick a number",
they don't say "20". ;-)

===
+  if (pipe (child_watch_wake_up_pipe) < 0)
+    g_error ("Cannot create wake up pipe: %s\n", g_strerror (errno));
+  fcntl (child_watch_wake_up_pipe[1], F_SETFL, O_NONBLOCK | fcntl
(child_watch_wake_up_pipe[1], F_GETFL));
+
+  /* We create a helper thread that polls on the wakeup pipe
indefinitely */
+  if (g_thread_create_full (child_watch_helper_thread, NULL, 0, FALSE,
FALSE, G_THREAD_PRIORITY_NORMAL, &error) == NULL)
+    g_error ("Cannot create a thread to monitor child exit status:
%s\n", error->message);
====

While I'm usually the last person to worry about overlong 
lines in my code, this section could definitely use a few more
line returns, or maybe even a temporary variable for the F_GETFL
result.

===
+/**
+ * g_child_watch_source_init:
+ * @void: 
+ * 
+ * Initializes a signal handler so that #GChildWatchSource can monitor
children.
+ * In most programs it is not necessary to call this function, as the
sources
+ * will initialize themselves appropriately.  The one time it can be
useful is
+ * when an external SIGCHLD handler that calls waitpid() is already
installed.
+ * In this case, there is a potential race between the child exiting
and the
+ * source installing its own handler.
+ * 
+ **/
+void
+g_child_watch_source_init (void)
===

I don't think making this public is a good idea; the precondition
for using this code is that you aren't using SIGCHLD yourself.
If you are, you can remove your handler before calling
g_child_watch_source_new(), we don't have to clutter the API
for that.

===
+  if (g_thread_supported())
+    {
+      if (child_watch_init_state == CHILD_WATCH_UNINITIALIZED)
+	g_child_watch_source_init_multi_threaded ();
+      else if (child_watch_init_state ==
CHILD_WATCH_INITIALIZED_SINGLE)
+	g_child_watch_source_init_promote_single_to_threaded ();
===

See above - you need to go to THREADED at g_thread_init() time,
so this can just be a 

 g_assert (child_watch_init_state != CHILD_WATCH_INITIALIZED_SINGLE);

===
+ * Additionally, as glib installs a SIGCHLD signal handler, system
calls can be
+ * interrupted and should be checked for EINTR.
===

I think that some more expansion on this sentence is needed here; likely
people reading these docs will not be experienced Unix programmers. On
the other hand, since we are always writing to the wakeup pipe, and
we control SIGCHLD, we might as well use sigaction()/SA_RESTART 
for our signal handler and make this pretty much a non-issue.

I suppose if we want to handle old sysv systems without sigaction(), 
we need to include the comment anyways, but that's pretty much
an edge-condition these days.

===
+ * The process id passed in must be a positive pid, .  If the child did
not exist, the source will be dispatched with
+ * a status of -1.
===

In GLib, if we "and must point to an existing pid", we don't document
what happens if it doesn't; the typical thing would be a g_warning().

===
+ *  Also, when multiple sources of the same pid are added, only
+ * one of them will be called.
===

"for the same pid", but better, forbid this. (With the current code,
it's not true that only one will be called, one will get called
with the correct status, the other with a status of -1)

===
+ * The source will not initially be associated with any #GMainContext
and must
+ * be added to one with g_source_attach() before it will be executed. 
Unlike
+ * other sources, any callback associated with it will only be called
at most
+ * once and no return value is expected.
===

===
+ * <note><para> Some thread implementations are buggy and do not
correctly
+ * notify the process when the child of a defunct thread exits.  In
these cases,
+ * there isn't much that can be done to catch the child
exiting.</para></note>
===

I don't believe that this note describes the problem well; as I
understood
the conversation, it should be:
 
 <note><para> Some thread implementations are buggy and one thread 
  cannot call waitpid() on a child created in a different thread; if
  you are using this functionality in a threaded program, you may
  need to structure your program so that child watches are always
  added to a GMainContext running in the same thread as where
  the child was created. </note></para>

===
+/**
+ * g_child_watch_add_full:
+ * @priority: the priority of the idle source. Typically this will be
in the
                                   ^^^^^^^^^^^
===

child watch

===
+ * Sets a function to be called when the child indicated by pid exits,
at a
+ * default priority, #G_PRIORITY_DEFAULT.  
===

delete the text about the priority here; since the priority is passed
in to add_full()

===
+ * See g_child_watch_source_new() for
+ * additional comments regarding this source.
===

What source? The word source hasn't been used in this function
description
yet. "for further information" would be fine.

===
+ * Sets a function to be called when the child indicated by pid exits,
at a
+ * default priority, #G_PRIORITY_DEFAULT.  This way, a childs exit
status can be
+ * gotten b
====

Something missing here.

===
+ *  gint pid = fork ();
+ *  if (pid > 0)
+ *    g_child_watch_add (pid, function, data);
+ *  else
+ *    {
+ *      /&ast; The child process  &ast;/
+ *      sleep (1);
+ *      _exit (0);
+ *    }
===

I'd note checking for errors from fork().

 if (pid > 0)
   {
   }
 else if (pid == 0) /* child */
   {
   }
 else 
   /* handle failure of fork() */

===
+ * g_main_loop_run (g_main_loop_new (NULL, FALSE));
===

Leaks the loop object, and doesnt' really illustrate why you'd use
GChildWatch (why not just call waitpid() here...) probably best
just omitted.

===
+ * See g_child_watch_source_new() for additional comments regarding
this source.
===

Same comment above about "this source".


===
 /* Miscellaneous functions
  */
 void g_get_current_time		        (GTimeVal	*result);
-
 /* ============== Compat main loop stuff ================== */
===

Stray change.

===
+/* Idles, child watchers and timeouts */
===

"child watches" would be more consistent with our terminology.

Couple of comments about test case:

 - I'd make the exit on fork() failure verbose on general principle.
 - The test case should be quiet on success, and should check
   that status is set correctly, and if not, exit with a non-zero
   exit status.
 - Rather than #define TEST_THREAD, I'd run first the non-threaded
   case, and then if G_THREADS_ENABLED, g_threads_init() and 
   run the threaded case. Or you might want to even g_threads_init()
   and create the other threads after creating a child but 
   before waiting for it to test that.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]