Re: [Tracker] [systemd-devel] How to use cgroups for Tracker?



On Tue, 21.10.14 12:03, Martyn Russell (martyn lanedo com) wrote:

What precisely are you setting with sched_setscheulder() and ioprio_set()?

https://git.gnome.org/browse/tracker/tree/src/libtracker-common/tracker-sched.c#n29

and

https://git.gnome.org/browse/tracker/tree/src/libtracker-common/tracker-ioprio.c#n131

Looks OK.

is much better, but some people really want Tracker to
  a) only perform when they're away from their computer OR
  b) be completely unnoticeable.

Now, we can do a) without cgroups, but I believe b) would be better done
using cgroups.

Well, it will always be noticable since it generates IO. It will
always show up in top/iotop hence. If you want it to not interfere
with other things beind done on the system use
ioprio_set/sched_setscheduler to mark things as batch jobs.

You're right. When we get reports of 100% CPU use I am not so concerned if
the system is not doing anything else - but I suspect what leads people to
find out about the 100% situation is related to their UI not
responding.

Hmm. I would always have assumed that tracker is strictly IO-bound,
not CPU-bound, hence 100% sounds suspicious to me. What precisely is
tracker doing there that it needs to crunch that much data? Just
extracting some meta-data from a set of files doesn't sound like
something CPU intensive.

Why don't the kernel API calls pointed out above not suffice?

I think it depends on the API and who you talk to.

For me, the APIs work quite well. However, we still get bug reports. I find
this quite hard to quantify personally because the filesystems, hardware and
version of Tracker all come into play and can make quite some
difference.

What precisely are the bug reports about?

Often 100% CPU use.
Sometimes high memory use (but that's mostly related to tracker-extract).
Disk use occasionally when someone is indexing a crazy data set.

The latest bug report:
https://bugzilla.gnome.org/show_bug.cgi?id=676713

Well, looking at that bug it appears to me that this is caused because
you try to use inotify for something it shouldn't be used for: to
recursively watch an entire directory subtree. If you fake recursive
fs watching by recursively adding all subdirs to the inotify watch
then of course you eat a lot of CPU.

The appropriate fix for this is to not make use of inotify this way,
which might mean fixing the kernel to provide recursive subscription
to fs events for unpriviliged processes. Sorry if that's
disappointing, but no amount of cgriups can work around that.

Don't try to work around limitations of kernel APIs by implementing
inherently not scalabale algorithms in userspace. I mean, you
implemented something that scales O(n) with n the numbers of
dirs. That's what you need to fix, there's no way around that. Just
looking for magic wands in cgroups and scheduling facilities to make
an algorithm that fundamentally scales badly acceptable is not going
to work.

The one API that doesn't really work for us is setrlimit(), mainly because
we have to guess the memory threshold (which we never get right) and we get
a lot of SIGABRTs that get reported as bugs. I suppose we could catch
SIGABRT and exit gracefully, but lately, we've agreed (as a team) that if an
extractor or library we depend on uses 2Gb of memory and brings a smaller
system to its knees, it's a bug and we should just fix the
extractor/library, not try to compensate for it. Sadly, there are always
these sort of bugs and it's precisely why tracker-extract is a separate
process.

What functionality of setrlimit() are you precisely using?

https://git.gnome.org/browse/tracker/tree/src/libtracker-common/tracker-os-dependant-unix.c?h=tracker-1.2#n289

This is misleading, as RLIMIT_AS and RLIMIT_DATA limit address space,
not actual memory usage. In particularly limiting RLIMIT_AS like this
is actively wrong as this just breaks mmap(). I mean, this is the
beauty of modern memory management: you can set up mappings, and they
are relatively cheap and only are backed by real RAM as soon as you
access them. really, you should not limit RLIMIT_AS, it's not for what
you think it is.

That coupled with bug reports often like this one where a PDF with no text
is taking over 200 seconds to extract and using 2Gb of memory:
https://bugs.freedesktop.org/show_bug.cgi?id=85196

2Gb of address space or actually dirty memory? 

That leads to SIGABRT and a bug report against Tracker, which we were not
handling because really, the file itself (or library used to open it) should
be fixed and tracker-extract was restarting itself until recently (which is
a bug that needs fixing).

I guess we could use RLIMIT_CPU and handle SIGXCPU, but I have no idea what
limitation (in seconds) to use in the first place.

Well, I am pretty sure that if an extractor you fork off takes more
than 10s you should kill it and skip the file. That's a pretty safe bet.

Lennart

-- 
Lennart Poettering, Red Hat


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]