Re: [gnet] Problems with GNet and GIOChannels...

From: James Wiggs <wiggs wiggs org>
To: Tim Müller <tim-mailinglists zen co uk>
Cc: gnet gnetlibrary org
Subject: Re: [gnet] Problems with GNet and GIOChannels...
Date: Fri, 20 Aug 2004 09:23:01 -0400
 Hello Tim,

   Many thanks for the reply.  I've been asking related
questions on this mailing list and the GTK mailing lists
for a few weeks now, and this is the first time anyone
has bothered to respond.  I was about to lose hope. ;^)

On Fri, 2004-08-20 at 04:59, Tim Müller wrote:
> On Friday 20 August 2004 01:46, James Wiggs wrote:
> 
> Hi James,
> 
> I don't have the slightest clue what exactly goes wrong in your case, but I've 
> got some comments nevertheless. 
> 
> 
> > 5) The FeedCommand thread
> >   - Maintains a TCP socket created using the GNet GTcpSocket API that
> >    connects to the hardware data receiver.  Commands/requests are sent
> >    via this socket, and explicit reply data is read from it.
> 
> Have you tried GConn instead? Not sure if GConn is still marked as 
> experimental, but I've made good experiences with it, and the API is _much_ 
> simpler than GTcpSocket.
> 

   I'm all for simpler, believe me, but not at the expense
of stability.  I can't emphasize enough how important it is
that these processes be as solid as granite.  Can any of the
GNet developers chime in with an opinion on how stable GConn
is right now?  I chose GTcpSocket specifically because GConn
is labelled in the documentation as "experimental."

> >    Each thread maintains an asynchronous message queue for receiving
> > messages from other threads.  The main loop for each thread consists
> > of a loop like this (note that I have omitted a lot of debugging/log
> > code so that this post isn't even longer):
> >
> >   -----
> > while( 1 ) {
> >   maxiter = 10;
> >   while( maxiter && (Msg = g_async_queue_try_pop( MyQueue )) ) {
> >     /* Handle the message if there is one */
> >     maxiter--;
> >   }
> >
> >   maxiter = 10;
> >   if( g_main_context_pending( MyContext ) && maxiter ) {
> >     /* Deal with callbacks, etc. */
> >     g_main_context_iteration( MyContext, FALSE );
> >     maxiter--;
> >   }
> >
> >   g_usleep( 1000 );  /* Pause one millisecond, then loop */
> > }
> >   -----
> 
> This looks a bit messed up to me, especially the first 'maxiter' part. Why 
> don't you use g_async_queue_timed_pop() instead?
> 
> Personally, I'd get rid of the GAsyncQueue stuff altogether though. You have 
> your own separate GLib main context for this thread, right? So instead of 
> doing g_async_queue_push() in the other threat to send a message to this 
> thread, I'd rather do something like:
> 
>      GSource *idlesrc;
>      idlesrc = g_idle_source_new();
>      g_source_set_callback (idlesrc, process_data_func, data, NULL);
>      g_source_attach (idlesrc, MyContext);
> 
> That way process_data_func() is called with data pretty much as soon as 
> possible in the context of MyContext, and you don't have to poll the async 
> queue any longer (or do the sleep etc.), but you can just create a main loop 
> for MyContext and run it until it is not required any longer.

   My programming here reflects my past experience writing
parallel numerical analysis codes for massively parallel non-
shared memory machines.  I'm very comfortable with the concept
of processes passing messages to each other, so I chose that
methodology for coordinating between processes.  Not saying
your solution isn't better, just that it isn't a natural way
of modeling the problem for me.  There will be a *lot* of 
these messages; I figure the overhead of maintaining the AMQ
would be less than setting up this idle callback and then
inserting it into another thread's context.  There would have
to be locking implemented on that thread's context, as well,
right?

   We'd be looking at 200-400 messages per second during the
market day, roughly 100-150 KBytes/second of data.  The idea
in the first loop is to try to prevent messages from piling
up; you can quickly yank messages out of the queue and handle
them up to some reasonable limit (which can be tuned), then
go and handle any callbacks that have been triggered, then
yield the CPU.  It seems to work pretty well; logging output
clearly shows things being "handled" properly and with good
efficiency.  How much overhead is really involved in setting
up an idle callback and then inserting it into the context of
another thread?

> 
> >   source = g_io_create_watch( SendServerSocketIOC, G_IO_IN );
>    ......
> >    My problem is that the socket connection SendSocket keeps shutting
> > down unexpectedly.  The code will work fine for minutes or even hours,
> > but eventually the Forward_Data_Packets stops being able to write to
> > the socket and OutputQueue starts filling up with data.  The client
> > on the other end of the socket eventually closes its end of the socket
> > and tries to reconnect, but gets a "connection refused" error message
> > when it does so.  I have branches of logic in the server that are
> > supposed to get called when the socket conditions G_IO_ERR, G_IO_HUP
> > or G_IO_NVAL come up, which is supposed to close the SendSocket and
> > re-register the callback that accepts socket connections, but it
> > seems like it never gets called.  
> 
> If you have one callback for  G_IO_IN | G_IO_ERR | G_IO_HUP, are you checking 
> the conditions bitwise, ie.:
> 
>  if ((cond & G_IO_IN))
>  {
>    ... get data ... and continue below even if you got some ...
>  }
> 
>  if ((cond & (G_IO_HUP|G_IO_ERR)))
>  {
>    ... socket closed for some reason ...
>  }
> 
> ?

   I actually have a separate callback for G_IO_ERR|G_IO_HUP|G_IO_NVAL.
I found some references to problems with a broken socket producing an
EPIPE signal that goes to the process, which will *kill* the process if
you don't have a handler set up for it.  Do you know if this is correct?
I don't see it explained that way in the man pages...

> I found in the past also that the exact details how a closed socket/descriptor 
> is communicated varies a bit from system to system. Sometimes I'd get a 
> G_IO_IN condition and then foo_read() would return 0 chars as indication of a 
> closed socket (well, in that case a pipe). My memory is a little bit murky 
> though, so this might just as well be complete rubbish ;)

   The big hole in my programming experience is the network sockets,
and I'm working furiously to build up my knowledge in that area.  It
sometimes seems to be a black art.  World + dog has put out primers
on doing TCP sockets, but the details of error handling are always
glossed over.  Mostly just "log message and exit on error".  I can't
*do* that.  When this thing is up and running in production, many
thousands of dollars will be riding on it working PERFECTLY, ALL THE
TIME.  It has to identify problems and handle them gracefully, and
do so more or less instantly.  I just wish I could find some better
reference materials on handling error conditions in network socket
programming.  Any suggestions?

> 
> >    If can post more detailed code samples if necessary, but this note
> > is already twice as long as I intended.  Happy to respond to any follow
> > up questions!
> 
> More useful than 'code samples' would be a minimal test program. Kind of 'run 
> this on computer A, run this on computer B, and watch it fail after a while'.

   I'd love to do this, but the problem is the code is intimately
tied to the hardware receiver, so no test program could really show
the full context.  I don't know how to get around that problem.  I'm
going to be rewriting some parts of the code over the next few days
and testing it again on Monday.  Hopefully I will have a better grip
on things then.

> Cheers
>  -Tim

thanks again,
Jim
Follow-Ups:
- Re: [gnet] Problems with GNet and GIOChannels...
  - From: Tim Müller
References:
- [gnet] Problems with GNet and GIOChannels...
  - From: James Wiggs
- Re: [gnet] Problems with GNet and GIOChannels...
  - From: Tim Müller
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]