Re: [Tracker] [Re: tracker in F16]
- From: Martyn Russell <martyn lanedo com>
- To: Lennart Poettering <mztabzr 0pointer de>
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] [Re: tracker in F16]
- Date: Tue, 20 Sep 2011 09:49:57 +0100
On 19/09/11 20:06, Lennart Poettering wrote:
Martyn, Jürg,
Hello Lennart.
There's a discussion going on on fedora-desktop about Tracker. To me it
appears the main longstanding issues in Tracker have not been fixed (see
attachment). Maybe you can comment on that?
Yes.
First of all, let me thank you for taking an interest here and we will
try to cover any history here and why we're in this state. All the
things you mention below are known and considered. We just haven't made
the step to fix anything in Kernel space yet and we will explain why.
Re: tracker in F16.eml
Subject:
Re: tracker in F16
From:
Lennart Poettering <mzerqung 0pointer de>
Date:
19/09/11 17:00
To:
Discussions about development for the Fedora desktop
<desktop lists fedoraproject org>
On Mon, 19.09.11 00:38, Lennart Poettering (mzerqung 0pointer de) wrote:
> > Having used tracker on by default for a bit now it seems that after the
> > initial crawling, which is an expensive operation, I didn't notice any
> > particular increase in resource usage. With removable devices indexing
> > off, this should ideally be a one-shot operation.
>
> I'd be thankful if the impact the crawling has on the running system
> could be minimized via SCHED_IDLE and IOPRIO_CLASS_IDLE. gnome bz
> #659422.
Hmm, so investigating this further with strace it appears to me that
tracker is trying to make use of the kernel in a way it shouldn't. Or to
put this another way: our infrastructure (the Linux kernel) isn't ready
for tracker yet.
I would say that's fair. If you refer to the SCHED_IDLE in terms of not
using the Kernel the way we should, I am not actually sure if that's a
Linux kernel issue at all or just that we need more than SCHED_IDLE can
provide. We should retry things here. In reality, the embedded devices
we've been using Tracker on suffered severely with this so we scrapped it.
One thing is clear, the point is almost moot if the later issues here
are fixed (regarding monitoring). But you still have the initial index,
you're right.
The problems here have been known for a long time, but afaik there still
hasn't been done anything about them to make things ready for tracker. I
am not sure we should enable tracker before these fundamental issues
aren't fixed. I am completely fine with enabling stuff that has known
I don't think it's wise to disable Tracker because of that. It's clear
that if mis-configured it's very easy to burn CPU cycles and eat I/O. We
have tried to mitigate this, but recent F16 changes that I have heard
about, point to a combination of our default configuration being not so
good (which we've changed in 0.12.1) and GVFS representing /home as a
removable media (also bad for Tracker). As commented here:
https://bugzilla.gnome.org/show_bug.cgi?id=659025#c8
bugs because that's the way how you get them fixed. But in this case
here I fear that the basics just don't exist and hence tracker is
fundamentally built on infrastructure that is borked. Everytime people
have looked into enabling tracker (or beagle for the matter) these
issues showed up, and every single time nothing happened about them, and
I am not really seeing why these issues stopped mattering now.
To be more explicit:
a) tracker uses inotify recursively and creates a massive number of
watches due to that. That is both ugly and doesn't scale. Tracker
apparently tries to not take up the full pool of inotfy handles the
system provides, but that won't help if you have more than one user on
Yes, this sucks enormously. When we have a clean shutdown (i.e. no work
is being done and we know we're up to date), we can start up with n
thousand directories being monitored in < 60 seconds with thousands of
directories being crawled under cold cache. Again this depends a little
on the hardware and the number of directories, but it's damn fast. It
also shouldn't be that noticeable. Aleksander has had some ideas about
how to avoid this entirely, but then it's a question of user space vs
kernel space. I will let him comment further here.
the system. The solution here should probably be fanotify, which allows
proper recursive file system watches. So far fanotify has been
accessible to root only, which is presumably why tracker doesn't use
it. However, the solution here cannot be to work around that fact by
using inotify, but must be to invest the necessary kernel work to make
fanotify useful from unprivileged processes.
Yes, we were looking on at Eric Paris' work with high expectations, but
from what we've discussed internally as a team, it doesn't look to
improve our situation nearly enough and really would just assist us
(from what I remember). Again Aleksander can comment on this.
b) We still don't have a way to detect offline modification of
directories. That means detecting changes to the home directory made
offline is very expensive. btrfs now has hooks to improve the situation,
but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs
provides a log of changes to userspace, which can be used for that. Another
solution are recursive directory change timestamps).
No we don't use btrfs hooks. To be honest, I didn't know about them and
that certainly something we can improve. So thank you for the suggestion
there.
As you eloquently point out, it doesn't fix other file systems though :/
Recursive directory change timestamps have also been considered. That
doesn't fix the issue for people that use USB keys or FAT file systems
where you can't count on that (which is why we just reindex entire mount
points recursively to make sure we get everything - and hence why
Emmanuele Bassi's /home directory would be completely indexed regardless
of any mtime checks).
I'd really prefer if we could fix these fundamental issues before we
enable tracker. To me it appears here as if we are trying to make the
second step before the first.
[ And there are acouple of other things I'd like to see changed. For
example, I am pretty sure that tracker's open() calls to files should
not be considered accesses in regards to access time. O_NOATIME should
be used here, which would reduce the amount of disk writes
We have an internal API tracker_file_open() for this and we use it in a
number of extracters (though not all):
$ git grep tracker_file_open -- *.c
src/libtracker-common/tracker-file-utils.c:tracker_file_open (const
gchar *path,
src/tracker-extract/tracker-extract-jpeg.c: f = tracker_file_open
(filename, "rb", FALSE);
src/tracker-extract/tracker-extract-png.c: f = tracker_file_open
(filename, "r", FALSE);
src/tracker-extract/tracker-extract-ps.c: f = tracker_file_open
(filename, "r", TRUE);
src/tracker-extract/tracker-extract-vorbis.c: f = tracker_file_open
(filename, "r", FALSE);
--
We could check, but IIRC, most other extractors use APIs to open files
which don't allow this - e.g. GIF files use DGifOpenFileName().
substantially. Also, tracker appears to BSD lock all files it
accesses. That looks quite borked. Which other tool is it synchronizing
against here? This looks unsecure to do (because the files are often
accessible to others), and since these locks are advisory only there
needs to be a strict protocol followed by everybody else accessing these
files, which I guarantee you there isn't since these are basically all
the user's files. Moreover it appears tracker is mixing BSD and POSIX
locks, which is dangerous due to ABBA, in particular when used on NFS
directories, which will just end up in total chaos since Linux is so
stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the
way. In any case you should NEVER EVER use POSIX locking, since it is
compltely borked anyway. The locking must go. I also see a massive
amount of futex calls in strace, i.e. probably some mutexes thrown in
the mix to make the locking problems even more interesting, which makes
my fingernails roll up, since they apparently are congested all the
time? ]
For now I will assume you mean calls to flock() here and we only do that
in tracker_file_lock().
While the code we have in place may be improved (and I welcome any
suggestions you have there), we're actually not using it any more and
that API could be removed AFAICS:
$ git grep tracker_file_lock .
src/libtracker-common/tracker-file-utils.c:tracker_file_lock (GFile *file)
src/libtracker-common/tracker-file-utils.h:gboolean tracker_file_lock
(GFile *file);
--
Thanks Lennart,
--
Regards,
Martyn
Founder and CEO of Lanedo GmbH.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]