Re: [Tracker] Use of fork() in tracker-extract-pdf



Hi Martyn,

Am 16.08.2013 um 13:21 schrieb Martyn Russell <martyn lanedo com>:
I just stumbled across this one, didn't notice before. A 20 seconds
watchdog timeout setup in

src/tracker-extract/tracker-controller.c:metadata_data_new().

I have a feeling that doesn't work which is why I wrote the fork() stuff...

In reality, this is actually what tracker-extract was built to do,
so ...

It seems in reality the 20 s watchdog timer is setup too early! Afair
it is not setup before the actual extraction is started, but already
at the moment the extraction event is received.

Ah that would explain why I think it doesn't work, CCd Carlos since IIRC he wrote the code here.

As I'm still somewhat uncomfortable with the whole dbus stuff, I couldn't easily find the appropriate place 
where the watchdog timer registration should be moved to and was hoping that one of you would be able to.


Now if several extraction requests a received and all of these are
files (e.g. PDFs) that take long to extract, the watchdog timers for
requests still on the pending queue are already expiring, I see this
sequence of events in the debug log:



12 Aug 2013, 11:55:49: Tracker: <--- [12|0]
handle_method_call_get_metadata_fast
(uri:'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf',  mime:'application/pdf', 
index_fd:0)



Extraction requests 1-11 are still being processes, so it takes some
time before this one gets it shot here:

12 Aug 2013, 11:56:06: Tracker: Dispatching
'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'  in main thread

Unfortunately this one takes more then 4 seconds, so the watchdog
killer steps in:

12 Aug 2013, 11:56:10: Tracker-Critical **: Extraction task for
'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'  went rogue and took more than 20 
seconds. Forcing exit.

...and tracker-extract gets restarted.

Increasing WATCHDOG_TIMEOUT from 20 to a large value will workaround
this, but a proper fix would be to setup the timer in
src/tracker-extract/tracker-extract.c:dispatch_task_cb() or similar.

I think the issue was that many people thought 20 seconds or even 10 seconds was quite long to wait for 
extraction of a file ...

I guess it depends on the file and user but generally speaking, that's also why we have limits on other 
extractions, like the maximum size of text we will extract from a text file.

Maybe this should be a configurable option for ALL extractions?

A configurable would be nice of course and having several different timeouts in different places somewhat 
cumbersome.

Now that you point out the 20 second timeout thing, I definitely think we need to fix the situation and 
reduce the number of places we have these timing checks.

+1


Afaict, the right design would involve an exec() in the child
and using some other IPC channel. I'll happily volunteer.

Yea, so we are actually calling exit() in the child. See:

extract_content_child_process()

What has the exit() call to do with this? Afaict that's completely
unrelated to the issue of using fork() in programs using glib.

"On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() 
or exit() from the child without returning to the mainloop."

We do call exit() from the child process and we don't use the GLib main loop from the child processs. That 
was my point.

I was afraid that that would be your point. :)
IIrc one of the underlying issue that after a fork the behavior of locks and mutexes held by the forking 
process at that moment is undefined in the child, so _any_ glib function that uses any of these locks results 
in undefined behavior.

I guess this demonstrates the effect of both issues I outlined: 1)
pid 1794 is deadlocked in a mutex that is probably still held by the
(now non existent) tracker-extract parent 2) the parent of 1794 can't
kill it off because it was itself killed by the 20 seconds watchdog
timer

It does appear as if you're right indeed. The more I think of it, the more I think the fork() code I added 
shouldn't be necessary and the timeout code should be fixed to work properly so we can avoid this whole 
problem.

+1

Thanks for your time and effort!

-Ralph

-- 
Ralph Böhme <rb netafp com>
Netatalk Developer | Support | Services
Curslacker Deich 254, 21039 Hamburg, Germany
http://www.netafp.com/

Attachment: smime.p7s
Description: S/MIME cryptographic signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]