Re: [Tracker] Use of fork() in tracker-extract-pdf



On 11/08/13 09:24, Ralph Böhme wrote:
Hi

Hello Ralph,

over the last days I've been testing Tracker 0.16.2 with a large set
of real world test data which are mostly PDFs from various sources.
From the start I ran into an issue that tracker-extract repeatedly
got stuck in one PDF or another.

Looking at the Tracker debug logs, process stack back-traces and
other debug data, it seemed the tracker-extract parent and child
processes where somewhere deadlocked sending data from the child to
the parent via the pipe IPC fd.

Um... parent (with a glib mainloop), child, glib, fork() ?

<https://developer.gnome.org/glib/2.37/glib-The-Main-Event-Loop.html>

 "On Unix, the GLib mainloop is incompatible with fork(). Any program
using the mainloop must either exec() or exit() from the child
without returning to the mainloop."

Hmm...

As I'm only really at the beginning of an in depth analysis, I can't
say for sure that the hangs I see are the cause of this, but knowing
there seems to exist a fundamental design flaw in tracker-extra-pdf,
I'm asking for thoughts on this.

So we use the parent/child set up because some PDFs take a REALLY long time to process and we have a 10 second window for them to be indexed. After that we kill the child process and return. We did this because we didn't want to kill the tracker-extract process all the time. In reality, this is actually what tracker-extract was built to do, so ...

Afaict, the right design would involve an exec() in the child and
using some other IPC channel. I'll happily volunteer.

Yea, so we are actually calling exit() in the child. See:

  extract_content_child_process()

Thoughts?

Are you sure it isn't a difficult PDF taking too long?

--
Regards,
Martyn

Founder & Director @ Lanedo GmbH.
http://www.linkedin.com/in/martynrussell


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]