[Tracker] Proposal to improve tracker-miner-fs "up-to-date" check performance

When tracker starts up, it will check whether the entries in DB are up-to-date or not.
Current logic is: for each file, there is at least one dbus-call from tracker-miner-fs to tracker-store which 
will execute a query. 
This is not efficient since dbus and query are expensive. (You can get the logs with dbus-monitor)

Here are two proposals to improve the performance.

1 Skip checks for ignored files:

In function crawler_check_directory_cb (tracker-miner-fs.c), there are two checks:
  should_check = should_check_file (fs, file, TRUE);
  should_change_index = should_change_index_for_file (fs, file);
As my understanding, if "should_check_file" returns FALSE, "should_change_index_for_file" is meaningless, 
since we do not process such files (see function "should_process_file"). So we can use the same logic in 
"should_process_file" to handle it: 

  if (should_check){
    should_change_index = should_change_index_for_file (fs, file);
  else {
    should_change_index = FALSE;
With this improvement, we can skip checks for files like ~/.cache/*, ~/.config/*, etc.

2) Reduce dbus calls and queries:

(1) At the beginning, execute one query to get all the <url, fileLastModified> pairs and put them in a hash 
(2) For each file, lookup the uri in the hash table, 
        if there is, 
            compare the time information of the file with the fileLastModified value from hash table,
            if the values are equal,
                The entry is up-to-date.
    Query is only required when it is not in the hash table or time is not match. 

(3) There is another issue in current implementation: 
url for "Directory" files have form like "urn:software-category" not "file:///" (see 
"miner_applications_process_file_cb" in tracker-miner-applications.c). So we should change the uri format 
before searching in hash table. 

(4) Free the hash table when miner finishes. 

In most cases, there is no or few change. With this improvement, tracker will become much much faster.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]