Re: [gamin] gam-poll.c rewrite



On Fri, Jul 01, 2005 at 01:17:32PM +0100, Neal H. Walfield wrote:
> >   Okay, good. But this breaks for me and this is a huge patch.
> > The dnotify.py regression test breaks and nearly all subsequent
> > ones.
> 
> I ran the tests suite and reported all the failures I saw in my email.
> I realize now that I have not been configuring with
> --enable-debug-api.  When I do this, I see the same failures as you.
> See below for an explanation.

  okay,

> > Not breaking the regression tests on dnotify is really a hard condition
> > to apply any patch at this stage unless we can understand and justify why:
[...]
> > Maybe it's just the debugging of kernel monitoring status which got
> > broken, but it looks like your patches changes the behaviour of the kernel monitoring
> > a lot, even in the simplest case. there are way too many dnotify event sent
> > which mean way too many transition of state for the dnotify watches, and this
> > occurs even on the simplest case, so in practice I think this is a serious
> > breakage of the dnotify back-end.
> 
> Actually, the current implementation doesn't do enough monitoring.
> Consider this example:
[...]
> The current code misses the deleted event because we don't monitor the
> parent directory.  (You see the same behavioir with
> s/watch_directory/watch-file/.)  So this is actually where most of the
> extra events are coming from: we need to monitor the parent of a
> directory for unlink events.  Currently, the code watchs every
> ancestor

  That is a huge change that I didn't see discussed at all !
It maybe the right thing but IMHO changes a lot ! For example
suddenly monitoring a resource may lead to monitoring a very 
busy directory, or put a monitor onto a kind of filesystem you
really don't want to match. Trivial example

   monitor /tmp/app-$user/lock

which is likely to be done by a number of applications.
suddently you force to monitor /tmp, which is one of the busiest
directory on any system. Do that on a machine with 100 users
and I'm sure to get a flood of bug reports from pissed-off customers
because there is hundred of gam_servers eating all the CPU on the
system.

 Please discuss changes like that before putting them in the code !

> even though you only need to watch the parent if the
> directory in question is empty.  However, that is non-trivial and may
> involve some races (if the last file is unlinked and then the
> directory as well we might miss the chance to set up the monitor for
> the parent) so I deferred.

  If a file or directory is monitored, it gets removed, then I would
expect a kernel event, then the stat() associated will show the resource
had disapeared.

> >   So as is that patch which is nearly 100KB doesn't pass. Could you chunk it
> > into digestible pieces with a well identified purpose ? It will allow me to
> > review them and help pinpoint where the problem is being introduced.
> 
> This patch really makes a single major change: it tries to handle the
> MON_ flags more intelligently.  These changes are deeply penetrating
> and intimately linked with the kernel back end and how MON_WRONG_TYPE
> functions.  I admit that the patch does include some refactoring of
> code, documentation improvements as well as a few algorithmic shifts
> e.g. from O(N) algorithms to O(1) and the consolidation of
> gam_exclude_check.  Hence, there is some low hanging fruit here and
> with a bit of work, I think I could make the patch about 30% smaller.
> Honestly, I think it will be difficult to break up the bulk of the
> patch and if I did manage to separate say the MON_WRONG_TYPE
> functionality from the rest, there would still be a strong dependency
> between the two (in both directions).
> 
> I understand that the patch is large but the changes are by their
> nature deep. I have a number of changes which depend on this patch
> which I haven't included so as to make this patch as small as
> possible.  Specifically, polling files and directories if the kernel
> back end is unable to monitor a node, e.g. if there are no more file
> descriptors available or a permission check fails; introducing back
> end capabilities so that back ends can use only what they actually
> need from the polling code, e.g. GAM_WATCH_LINK and GAM_WATCH_UNLINK
> to indicate that the back end can see directory link and unlink events
> and thus the poll back end needn't worry about this and
> GAM_WATCH_CONTENTS if the back end can see changes in files in the
> directory.
> 
> Where do you want to go from here?

  I want to see discussions of features *before* you code them !
You are doing radical design changes, and there is not a single exchange
of mails about them before hand. You can't work in isolation, I admit
that the mail-list being down for 2 days (but that was documented/planned)
doesn't help, but it's a process problem.

  How to go from there:
    - I don't think I will accept a patch monitoring all parent directories.
      That mean we can't even run reliable regression tests anymore, and
      while I like the work you are doing, being able to make regression
      tests is even more crucial.
    - If doing just the parent directory monitoring, then it would be
      possible to fix the dnotify*.py and flood*.py regression tests
      That's a patch I may accept after checking behaviour on the overall
      desktop on a couple of OS versions, but that would be a gamin 0.2xx
      new release, I will not consider this an incremental release of 0.1
      
  and in general compile and work with all debug switched on, that will avoid
a lot of small back and forth exchanges and help fixing problems as they
arise, modify the configure.in to force this in your environment, that's
what I did.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]