Re: [gamin] New inotify backend checked in



On Wed, Aug 03, 2005 at 12:26:22PM -0400, John McCutchan wrote:
> On Wed, 2005-08-03 at 12:20 -0400, John McCutchan wrote:
> > On Wed, 2005-08-03 at 11:15 -0400, Daniel Veillard wrote:
> > > It then must be implemented at the user level.
> > > It is not acceptable to argue about a specific problem in Dnotify support
> > > to just cancel this fundamental property. inotify would not need to
> > > maintain a tree of stat() info but one per cancleeled kernel monitor. 
> > 
> > Keeping a stat() tree for each cancelled kernel monitor isn't as easy as
> > it sounds. That is a very racey operation. It would be easy to miss
> > events in between your last inotify event and the scan of the directory.
> 
> Replying to myself,
> 
> I'm not saying that I wouldn't want this if we can show that it really
> is useful. I'd just like to see some real justification (ie benchmark
> numbers) showing that we do need to provide it. Performance is excellent
> for me without any gamin supplied flow control.

  Okay lemme dump some of the reason why I'm afraid removing
flow control at the user level is the wrong way to go.
  You tested it but for activity on a single file growing fast.
This is to me a micro-benchmark, usually making design decision
based on micro-benchmarks is a tempting but dangerous pitfall.
  Let say you have a system with a large number of files and directory
like hum .. rpmfind.net . You want to do "something" when files changes.
You want to know when the change is done and what files (in rpmfind
case what directories) are affected. The phenomenon of locality of
accesses is to be verified there too that me a subset will change
rapidly and then another subset will change etc ...
  If you remove flow control, in my case I'm gonna get zillion of events
as rsync mirrors sets of files, what I really want to know from gamin
would be when they are done, i.e. say like a 10s timeout after any
change to a monitored directory, I can't care less about the N events
per second per modified file at a given time. Worse those will generate
many context switches which on a server are what is generating load.
  I think if I were to switch rpmfind.net to a kernel based update
reporting, then 1/ I will need inotify as dnotify would explode the
number of open fd 2/ I want flow control at the user level since
the kernel don't have ways to limit to just the events I need (well
especially though gamin).
  I think this applies to an awful lot of server usage (NNTP, SMTP,
even HTTP to regenerate from modified templates), I think if you were
to switch beagle to gamin you would have to either extend the API or
add flow control at the user level, otherwise the kernel is just 
gonna drop events. Of course it's hard to benchmark correctly because
correctness is #1 factor. I believe first in getting the architecture
right, and only then benchmark and optimize, not the way around ;-)

   "Lies, damn lies and microbenchmarks"

Daniel
-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]