Re: [gamin] New inotify backend checked in



On Fri, 2005-08-05 at 11:04 -0400, Daniel Veillard wrote:
> On Fri, Aug 05, 2005 at 10:38:06AM -0400, John McCutchan wrote:
> > On Thu, 2005-08-04 at 15:23 -0400, Daniel Veillard wrote:
> 
> > > 
> > > > First off, I don't think gamin is the right choice for this kind of work
> > > > load. And you wouldn't need flow control if you were talking directly to
> > > > inotify, since you can just ask for only IN_CLOSE_WRITE events. Also,
> > > > I'd like to see how the gam_server (using inotify) handles this kind of
> > > > load. I have a feeling that the performance would be better than
> > > > expected.
> > > 
> > >   The only expectation is the comparison with flow control as present in
> > > current released version. I don't think performances would be better than
> > > user level based flow control.
> > 
> > I'm pretty sure it would be. I have a script that just loops doing this:
> > 
> > while true do
> > 	for i in `seq 1 1000` do
> > 		touch dog$i
> > 		mv dog$i cat$i
> > 		touch fox$i
> > 		mv fox$i dog$i
> > 		rm cat$i
> > 		rm fox$i
> > 	done
> > done
> 
>   What are you watching ?
> 

Nautilus is watching the directory the script is being run in.

> > The kernel can't do any flow control with these events, because there
> > aren't any pairs of identical events.
> 
>   If you are watching the directory, yes you should be able to do flow
> control. 

I said the kernel can't do any flow control. User space could provide
flow control.

> And in my rpmfind.net example I stated I would watch only the
> directories.
>   Ideally with flow control you the user process would get woken up
> 4 times until busy status is asserted, then once per second of activity until
> the script is finished. What do you get with inotify without flow control
> on this ?
> 

The process would get woken up more often in the inotify case. Inotify
would be somewhere between raw-dnotify and and gamin-dnotify-busy flow
control.

> > I run this script in 4 folders with nautilus windows open for each of
> > the folders. The CPU load of gam_server is not even on the map.
> 
>   I don't care about Nautilus watching in my case, nautilus watches
> all directories and all files within the directory, i.e. not the kind of
> workload I suggested in my example.

The point is, that nautilus does a lot of work for events like creating
a file, and removing a file. And it watches not only the directories,
but the files to. So, there is a lot more inotify traffic. Yet, still
gam+inotify is no the bottle neck in anyway.

> 
> > There are only 2 processes using noticeable amounts of CPU, 30%
> > nautilus, and 30% Xorg. This is an extreme example, but it shows that
> > gamin with the inotify backend is not a bottle neck.
> 
>   This is an interesting result, how frequently are you passing events
> to nautilus in that scenario ?
> 

Roughly, every 25-50 milliseconds.

> > > > >   I think this applies to an awful lot of server usage (NNTP, SMTP,
> > > > > even HTTP to regenerate from modified templates), I think if you were
> > > > > to switch beagle to gamin you would have to either extend the API or
> > > > > add flow control at the user level, otherwise the kernel is just 
> > > > > gonna drop events. 
> > > > 
> > > > Beagle doesn't use any flow control at all. The kernel will queue up to
> > > > 16384 events per inotify instance. That is a ton. 
> > > 
> > >   I will trade my cold (pageable) extra stat data in user space for
> > > your hot (cache wise) kernel memory pinned events. It's a tradeoff
> > > I'm not sure we are on the right side of that trade off.
> > > 
> > 
> > It's not cold/pageable data, because you are walking it once a second.
> > It's going to stay pinned in memory. 
> 
>   No, we are not walking all the stat info every second. Only those which
> are stated as busy at a given point in time, which I would expect to be
> a tiny fraction based on locality of access.

That is in the dnotify case, but in the inotify case, it would be hot.

> 
> > Also, the events are a tiny 32 bytes. With a full event queue (16
> > THOUSAND events), only 512K of memory is being used. Now, a stat buffer
> > is 88 bytes, plus all the other data in GamNode. Let me add it up
> > roughly,
> > 
> > path -> full path, so probably around 64 Bytes
> > subs -> pointer, 4 bytes
> > node -> pointer, 4 bytes
> > is_dir -> 1 byte
> > flags -> 4 bytes
> > poll_time -> 4 bytes
> > mon_type -> 4 bytes
> > checks -> 4 bytes
> > pflags -> 4 bytes
> > lasttime -> 4 bytes
> > sbuf -> 88 bytes
> > ---------------------
> > 181 bytes
> > 
> > So, keeping the stat data around, takes 6 times as much room as the an
> > inotify event in the queue. 
> 
>   but this is pageable data, which is not accessed if the corresponding
> file are unchanged.

Not with inotify.

> 
> > >   I can understand that, but how are you gonna seek that workload feedback ?
> > > I't gonna take a while before people even test a kernel with inotify
> > > for this kind of workloads.
> > > 
> > 
> > Well, once we have a release out. People will start to use it.
> 
>   The problem is the kernel. People don't switch to the kernel of the day
> on this kind of use.

The people who are going to provide useful feedback are the people who
will be rolling the kernel of the day. And it's not going to be the
kernel of the day, it's going to be the official 2.6.13.

> 
> > > > Besides, we can save TONS of memory by going this route. Right now
> > > > memory is much scarcer than CPU.
> > > 
> > >   Eeek depends who you talk to, don't generalize really. And we didn't
> > > tried to optimize the stat at all. dnotify is horrible for that because it 
> > > forced to maintain the full tree and directory children, on inotify it
> > > would be just a stat data per busy resource in an hash table, way cheaper !
> > 
> > I don't think that's much of a generalization. Look at all the
> > performance talk surrounding gnome. They talk about hitting the disk,
> > and memory usage, not CPU usage. 
> 
>    I'm talking servers and you answer Gnome and Nautilus.
> 

I'm not talking specifically about Gnome or Nautilus. The bottle necks
in applications today are memory & disk bound, not CPU bound.

As an aside, I don't think much server software uses FAM currently. And
I don't think that they should start. Raw inotify is a much more
appropriate case, especially for your example. You can ask inotify to
tell you only when a file has been closed that was open for writing.
Avoiding the storm of modify events. FAM doesn't even let the
application set an event mask. Also, it could be argued that some
applications will want to do their own flow control, and not have events
hidden from it.

Let's keep things real, and admit that the majority of gamin's users are
Desktop users.

> > Yes, if we decide to support a busy poll fallback when using inotify, it
> > would be MUCH cheaper than the dnotify was. But I'm still not convinced
> > that it is needed.
> 
>    I gave rationale for it, point is that if it isn't needed it won't
> be used, but implementation is ultra cheap especially on inotify. The problem
> is that within 3 months you won't get any feedback about those use cases.

Yes, and I don't think it's actually worth it. And it will get used,
because gamin will detect busy directories automatically, and disable
inotify. Look, we have all the code necessary to hook this into inotify.
I just want to wait and see if it's REALLY needed. 

-- 
John McCutchan <ttb tentacle dhs org>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]