Re: [gamin] New inotify backend checked in



On Thu, 2005-08-04 at 15:23 -0400, Daniel Veillard wrote:
> On Thu, Aug 04, 2005 at 02:47:10PM -0400, John McCutchan wrote:
> > USER: gam_server blocks on inotify_device_fd
> > 
> > KERNEL: event is queued on inotify_device_fd
> > 
> > USER: gam_server wakes up, sees that only 1 event has been queued,
> > decides to go back to sleep
> 
>   Hum, sounds an heuristic, what is your criteria ? 

Inotify supports the FIONREAD ioctl. FIONREAD returns the number of
bytes that could be read from the file descriptor. So, we assume that
the real size of an inotify event is sizeof(struct inotify_event) + 16
(average file name length). So we can calculate with high accuracy how
many events are in the queue.

> If that event is that a file was created on the Desktop, why do you delay,
> how long do you delay ?

We delay because if we didn't we would start to serialize all filesystem
access. When a process is blocking on a fd, and it becomes readable, the
process gets woken up. So, if we empty the event queue after each event,
we will have a ping-pong between gam_server and every process writing to
paths watched by gamin. 

Because we delay, we can skip 1000's of context switches. When the fd
first becomes readable, we get woken up, check the number of events
available, and go back to sleep unless it is large. The next time an
inotify event is queued up, we won't be woken up. We will sleep until
our sleep time is over. 

The current sleep time is 2 milliseconds, but we will sleep up to 5
times in a row if there aren't enough events. After 5 sleeps (or 10
milliseconds) we will service the events, however few are available.

> 
> > First off, I don't think gamin is the right choice for this kind of work
> > load. And you wouldn't need flow control if you were talking directly to
> > inotify, since you can just ask for only IN_CLOSE_WRITE events. Also,
> > I'd like to see how the gam_server (using inotify) handles this kind of
> > load. I have a feeling that the performance would be better than
> > expected.
> 
>   The only expectation is the comparison with flow control as present in
> current released version. I don't think performances would be better than
> user level based flow control.

I'm pretty sure it would be. I have a script that just loops doing this:

while true do
	for i in `seq 1 1000` do
		touch dog$i
		mv dog$i cat$i
		touch fox$i
		mv fox$i dog$i
		rm cat$i
		rm fox$i
	done
done

The kernel can't do any flow control with these events, because there
aren't any pairs of identical events.

I run this script in 4 folders with nautilus windows open for each of
the folders. The CPU load of gam_server is not even on the map.

There are only 2 processes using noticeable amounts of CPU, 30%
nautilus, and 30% Xorg. This is an extreme example, but it shows that
gamin with the inotify backend is not a bottle neck.


Daniel, could you try to use inotify+gamin on rpmfind.net ? Just as an
experiment, then we could see real world data.

> 
> > >   I think this applies to an awful lot of server usage (NNTP, SMTP,
> > > even HTTP to regenerate from modified templates), I think if you were
> > > to switch beagle to gamin you would have to either extend the API or
> > > add flow control at the user level, otherwise the kernel is just 
> > > gonna drop events. 
> > 
> > Beagle doesn't use any flow control at all. The kernel will queue up to
> > 16384 events per inotify instance. That is a ton. 
> 
>   I will trade my cold (pageable) extra stat data in user space for
> your hot (cache wise) kernel memory pinned events. It's a tradeoff
> I'm not sure we are on the right side of that trade off.
> 

It's not cold/pageable data, because you are walking it once a second.
It's going to stay pinned in memory. 

The kernel events aren't going to be in the cache, because they are only
touched when they are queued (unavoidable), and they are actually sent
to user space (which we try and do as infrequently as possible). So,
they will quickly be ejected from the cache, and won't return until they
are actually used.

Also, the events are a tiny 32 bytes. With a full event queue (16
THOUSAND events), only 512K of memory is being used. Now, a stat buffer
is 88 bytes, plus all the other data in GamNode. Let me add it up
roughly,

path -> full path, so probably around 64 Bytes
subs -> pointer, 4 bytes
node -> pointer, 4 bytes
is_dir -> 1 byte
flags -> 4 bytes
poll_time -> 4 bytes
mon_type -> 4 bytes
checks -> 4 bytes
pflags -> 4 bytes
lasttime -> 4 bytes
sbuf -> 88 bytes
---------------------
181 bytes

So, keeping the stat data around, takes 6 times as much room as the an
inotify event in the queue. 

> > > Of course it's hard to benchmark correctly because
> > > correctness is #1 factor. I believe first in getting the architecture
> > > right, and only then benchmark and optimize, not the way around ;-)
> > 
> > I think that we should wait until we can find a load that causes a
> > problem before we add 'fallback to poll' flow control. We have all the
> > code, it is trivial to hook it back into the inotify backend. I'd just
> > like to see a real case where the new backend causes a performance
> > problem. 
> 
>   I can understand that, but how are you gonna seek that workload feedback ?
> I't gonna take a while before people even test a kernel with inotify
> for this kind of workloads.
> 

Well, once we have a release out. People will start to use it.

> > Besides, we can save TONS of memory by going this route. Right now
> > memory is much scarcer than CPU.
> 
>   Eeek depends who you talk to, don't generalize really. And we didn't
> tried to optimize the stat at all. dnotify is horrible for that because it 
> forced to maintain the full tree and directory children, on inotify it
> would be just a stat data per busy resource in an hash table, way cheaper !

I don't think that's much of a generalization. Look at all the
performance talk surrounding gnome. They talk about hitting the disk,
and memory usage, not CPU usage. 

Yes, if we decide to support a busy poll fallback when using inotify, it
would be MUCH cheaper than the dnotify was. But I'm still not convinced
that it is needed.


-- 
John McCutchan <ttb tentacle dhs org>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]