Re: [Tracker] [systemd-devel] How to use cgroups for Tracker?



On Tue, 21.10.14 14:48, Martyn Russell (martyn lanedo com) wrote:

Hmm. I would always have assumed that tracker is strictly IO-bound,
not CPU-bound, hence 100% sounds suspicious to me. What precisely is
tracker doing there that it needs to crunch that much data? Just
extracting some meta-data from a set of files doesn't sound like
something CPU intensive.

Tracker does quite a number of things that could require some level of
processing power.

Just some off the top of my head:

- Parsing words any any language in large quantities of text
- Unaccenting words
- Unicode normalization
- Case folding
- Stemming

and more.

That all doesn't sound too excessive in CPU except if you index whole
encyclopedias... But in this case I'd really time bound things: stop
processing each file after 500ms or so of time spent on them.

It also depends on which process or binary you're talking about, but
extractors (like the one using poppler for PDFs) can easily require a LOT of
processing power to handle complex PDFs. We only care about the text
usually, but that's not always under our control unless we write our own
extractor.

Well, it certainly sounds like a great chance to work together with
the poppler folks to figure out a way to only hand you the text. But
either way, given the variable quality of the extractors it really
sounds as if you want to indivudally run them out-of-process and then
kill after a fixed time limit of 500ms and continue with the next
one. 

Well, looking at that bug it appears to me that this is caused because
you try to use inotify for something it shouldn't be used for: to
recursively watch an entire directory subtree. If you fake recursive
fs watching by recursively adding all subdirs to the inotify watch
then of course you eat a lot of CPU.

In our experience, watching a tree and seeing changes in that tree through
inotify is not the expensive part (unless you're currency is FDs). It does
depend on what operations are taking place.

Well, the bug report you linked suggests it is an inotify add loop
that is the culprit here...

The appropriate fix for this is to not make use of inotify this way,
which might mean fixing the kernel to provide recursive subscription
to fs events for unpriviliged processes. Sorry if that's
disappointing, but no amount of cgriups can work around that.

Not at all. Actually, I really would like something like that in the kernel
and user space has been asking for a while :)

Well, I think I said this before: it's quite possible that the Linux
kernel is not quite ready for something like Tracker. Apparently
nobody is working to make it ready for Tracker though. So either you
have to do the work yourself or find somebody to fix the missing bits
(which would be to fix fanotify for unpriviliged clients).

Don't try to work around limitations of kernel APIs by implementing
inherently not scalabale algorithms in userspace. I mean, you
implemented something that scales O(n) with n the numbers of
dirs. That's what you need to fix, there's no way around that. Just
looking for magic wands in cgroups and scheduling facilities to make
an algorithm that fundamentally scales badly acceptable is not going
to work.

OK.

Could I ask one more favour from you Lennart, could you possibly reply on
the bug report where your fellow RedHat-ers :) suggest using cgroups?

https://bugzilla.gnome.org/show_bug.cgi?id=737663#c6

Well, just link this thread there really, I think that should be enough...

This is misleading, as RLIMIT_AS and RLIMIT_DATA limit address space,
not actual memory usage. In particularly limiting RLIMIT_AS like this
is actively wrong as this just breaks mmap(). I mean, this is the
beauty of modern memory management: you can set up mappings, and they
are relatively cheap and only are backed by real RAM as soon as you
access them. really, you should not limit RLIMIT_AS, it's not for what
you think it is.

When would you use this functionality? I struggle to see a use case.

Well RLIMIT_AS is certainly not useful for your purpose, that's true.

As it turns out both RLIMIT_DATA and RLIMIT_RSS are NOPs these days.

Which basically means, that the memory cgroup controller is the only
technology that would come close, but there you'd still have a problem
to find a value to initialize its limit too (also, as mentioned, this
isn't open to unpriviliged processes just now). 

Lennart

-- 
Lennart Poettering, Red Hat


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]