Re: [gamin] New inotify backend checked in



On Fri, Aug 05, 2005 at 10:38:06AM -0400, John McCutchan wrote:
> On Thu, 2005-08-04 at 15:23 -0400, Daniel Veillard wrote:
> The current sleep time is 2 milliseconds, but we will sleep up to 5
> times in a row if there aren't enough events. After 5 sleeps (or 10
> milliseconds) we will service the events, however few are available.

  That's the heuristic I asked about.
  Okay.

> > 
> > > First off, I don't think gamin is the right choice for this kind of work
> > > load. And you wouldn't need flow control if you were talking directly to
> > > inotify, since you can just ask for only IN_CLOSE_WRITE events. Also,
> > > I'd like to see how the gam_server (using inotify) handles this kind of
> > > load. I have a feeling that the performance would be better than
> > > expected.
> > 
> >   The only expectation is the comparison with flow control as present in
> > current released version. I don't think performances would be better than
> > user level based flow control.
> 
> I'm pretty sure it would be. I have a script that just loops doing this:
> 
> while true do
> 	for i in `seq 1 1000` do
> 		touch dog$i
> 		mv dog$i cat$i
> 		touch fox$i
> 		mv fox$i dog$i
> 		rm cat$i
> 		rm fox$i
> 	done
> done

  What are you watching ?

> The kernel can't do any flow control with these events, because there
> aren't any pairs of identical events.

  If you are watching the directory, yes you should be able to do flow
control. And in my rpmfind.net example I stated I would watch only the
directories.
  Ideally with flow control you the user process would get woken up
4 times until busy status is asserted, then once per second of activity until
the script is finished. What do you get with inotify without flow control
on this ?

> I run this script in 4 folders with nautilus windows open for each of
> the folders. The CPU load of gam_server is not even on the map.

  I don't care about Nautilus watching in my case, nautilus watches
all directories and all files within the directory, i.e. not the kind of
workload I suggested in my example.

> There are only 2 processes using noticeable amounts of CPU, 30%
> nautilus, and 30% Xorg. This is an extreme example, but it shows that
> gamin with the inotify backend is not a bottle neck.

  This is an interesting result, how frequently are you passing events
to nautilus in that scenario ?

> Daniel, could you try to use inotify+gamin on rpmfind.net ?

  No, it runs RHEL and it's very unlikely I would reboot to a non-released
and unsupported kernel, sorry :-\

> > > >   I think this applies to an awful lot of server usage (NNTP, SMTP,
> > > > even HTTP to regenerate from modified templates), I think if you were
> > > > to switch beagle to gamin you would have to either extend the API or
> > > > add flow control at the user level, otherwise the kernel is just 
> > > > gonna drop events. 
> > > 
> > > Beagle doesn't use any flow control at all. The kernel will queue up to
> > > 16384 events per inotify instance. That is a ton. 
> > 
> >   I will trade my cold (pageable) extra stat data in user space for
> > your hot (cache wise) kernel memory pinned events. It's a tradeoff
> > I'm not sure we are on the right side of that trade off.
> > 
> 
> It's not cold/pageable data, because you are walking it once a second.
> It's going to stay pinned in memory. 

  No, we are not walking all the stat info every second. Only those which
are stated as busy at a given point in time, which I would expect to be
a tiny fraction based on locality of access.

> Also, the events are a tiny 32 bytes. With a full event queue (16
> THOUSAND events), only 512K of memory is being used. Now, a stat buffer
> is 88 bytes, plus all the other data in GamNode. Let me add it up
> roughly,
> 
> path -> full path, so probably around 64 Bytes
> subs -> pointer, 4 bytes
> node -> pointer, 4 bytes
> is_dir -> 1 byte
> flags -> 4 bytes
> poll_time -> 4 bytes
> mon_type -> 4 bytes
> checks -> 4 bytes
> pflags -> 4 bytes
> lasttime -> 4 bytes
> sbuf -> 88 bytes
> ---------------------
> 181 bytes
> 
> So, keeping the stat data around, takes 6 times as much room as the an
> inotify event in the queue. 

  but this is pageable data, which is not accessed if the corresponding
file are unchanged.

> >   I can understand that, but how are you gonna seek that workload feedback ?
> > I't gonna take a while before people even test a kernel with inotify
> > for this kind of workloads.
> > 
> 
> Well, once we have a release out. People will start to use it.

  The problem is the kernel. People don't switch to the kernel of the day
on this kind of use.

> > > Besides, we can save TONS of memory by going this route. Right now
> > > memory is much scarcer than CPU.
> > 
> >   Eeek depends who you talk to, don't generalize really. And we didn't
> > tried to optimize the stat at all. dnotify is horrible for that because it 
> > forced to maintain the full tree and directory children, on inotify it
> > would be just a stat data per busy resource in an hash table, way cheaper !
> 
> I don't think that's much of a generalization. Look at all the
> performance talk surrounding gnome. They talk about hitting the disk,
> and memory usage, not CPU usage. 

   I'm talking servers and you answer Gnome and Nautilus.

> Yes, if we decide to support a busy poll fallback when using inotify, it
> would be MUCH cheaper than the dnotify was. But I'm still not convinced
> that it is needed.

   I gave rationale for it, point is that if it isn't needed it won't
be used, but implementation is ultra cheap especially on inotify. The problem
is that within 3 months you won't get any feedback about those use cases.

Daniel


-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]