Re: [Tracker] [Patch] New CVS version



Laurent Aguerreche wrote:
Le mardi 22 aoÃt 2006 Ã 10:39 +0100, Jamie McCracken a Ãcrit :
Gergan Penkov wrote:
and trackerd does not exit, I haven't waited more than 5-10 minutes though.
Rats - I thought I sussed this one!

if it takes more then a few seconds to exit then its a deadlock on a mutex. I guess I should unlock them all as a precaution when exiting. Please try latest cvs again (I cannot replicate your problem mind you)

I found a case where it doesn't exit smoothly.

What I found:

in a thread, called thread1, tracker_log() is called to write "Watching
directory %s (total watches = %d)":
- g_print() to STDOUT, Ok;
- then,
  1/ lock()      <<====== important
  2/ open() on ~/.Tracker/tracker.log
- *BOOM*, user does a Ctrl+C, so this thread won't continue as
signal_handler() has been called and replaces prior code...
- now, tracker_log() is called to write "Received termination signal %d
so now exiting."
- g_print() to STDOUT, Ok;
- then,
  1/ lock()
  of course, it will wait for ever since this mutex is already locked
and won't be released...

At the same time, other threads will want to log something so they will
be locked.



Great thanks for that - it sure was a subtle one!

The right thing to do here is not to tracker_log anything until the second phase of the shutdown. (the resumption of code after the sig handler will allow the code to continue to unlock the mutex).




I propose a new patch!


I introduce a new thread dedicated to signal trapping and only to that.
It starts at soon as possible and has to treat any pending signals with
sigwait(), but only one at time (a mutex is used). These signals are
then filtered in signal_handler().
So when a signal is sent, a SIGINT for instance, only this thread will
treat it and the others won't be disturbed.

Now that I'm sure that only do_cleanup() will call g_main_loop_quit(), I
can use this call to cleanly quit the main Glib loop and continue the
code after g_main_loop_run() in main() to have only one way to terminate
Tracker. I added a g_thread_join() to be sure that do_cleanup()
finished. Is not really useful except to do things more cleanly.
I also added a field to tracker structure to know if Tracker did an
error or not (tracker.is_exit_success).

I also removed the delayed call to do_cleanup() in signal_handler() for
SIGINT or SIGTERM but it made Tracker to crash very often... I found
that it was due to a share on the same connection to MySQL at two
concurrent queries (call to tracker_db_clear_temp() in do_cleanup()). So
I had a mutex in start_watching() to know when it finishes and I put
call to tracker_db_clear_temp() after any work on file polling.


With this patch, Tracker works correctly almost all the time but still
not always...

using a new thread dedicated to signal handling is a bit expensive for lessor machines or mobiles.

I wont say no to this but it would be a last resort (IE if tracker cannot easily be shut down without this)

The current stuff works most of the time too and the two phase shutdown allows code to resume where it left off prior to being interrupted by the signal. This is important as it will prevent corruption due to half a record being written to the DB as a result of it being interrupted and then terminated without resuming.

If possible (and practical) I would like to get the two phase shutdown perfectly working as its almost there.



I got a deadlock at: g_mutex_lock (tracker->files_signal_mutex); in
do_cleanup().
In GDB, I can see that there are only two threads:
1/ main thread which called g_main_loop_run();
2/ thread which is waiting to lock tracker->files_signal_mutex.


The file process thread will either be sleeping (in which case g_cond_wait will unlock the mutex) or busy processing stuff with that mutex locked.

The only way the file process thread will deadlock is if it cannot be put to sleep which means it must be deadlocking during the check phase (on the check mutex).

I cant see why though? We unlock the check mutex in do_cleanup and the only other place its locked is in the notifications but these are then immediately unlocked.

Any thoughts on this would be most appreciated as I cant easily replicate it.


It is sure that tracker_notify_file_data_available() hasn't be called.

We dont want it to be called if tracker is shutting down so it should return without notifying


I wonder if the problem is in make_pending_file() in tracker-db.c
because there are lines:


        if (!mime) {
                if (tracker->is_running) {
                        tracker_exec_proc  (db_con, "InsertPendingFile", 6, str_file_id,
str_action, str_counter, uri,  "unknown", str_is_directory);
                } else {
                        return;
                }
        } else {
                if (tracker->is_running) {
                        tracker_exec_proc  (db_con, "InsertPendingFile", 6, str_file_id,
str_action, str_counter, uri,  mime, str_is_directory);
                } else {
                        return;
                }
        }

        //tracker_log ("inserting pending file for %s with action %s", uri,
tracker_actions[action]);

        /* signal respective thread that data is available and awake it if its
asleep */
        if (action == TRACKER_ACTION_EXTRACT_METADATA) {
                tracker_notify_meta_data_available ();  
        } else {
                tracker_notify_file_data_available ();
        }

So, if Tracker isn't running, neither
tracker_notify_meta_data_available() nor
tracker_notify_file_data_available() will be called...



yes thats right - we dont want to wake up the file process thread. We want the exact opposite IE to put it to sleep.

If the threads are sleeping or can be made to sleep then we can *guarantee* their clean exit in the second phase of the shutdown process.

Thanks anyway for looking into this.

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]