Re: [Tracker] Proposal to improve tracker-miner-fs "up-to-date" check performance



Carlos Garnacho wrote:

As Philip said, we should take into account memory usage as well, and
keeping a hashtable for each known item is not going to be nice...
TrackerCrawler guarantees that any directory will be processed after
its parent folder, and all the items in a directory will be processed
together, so we very probably can do this on a per-folder basis.

Agree. Combining with Philip and your suggestion, I prefer the logic as:

(1) get the total count of items with SPARQL's COUNT. 
     if count > 1000 
          do per-folder basis query with OFFSET and LIMIT
     else
         get all items once. 

For most systems like netbook or handset, there are not much items. 

(3) There is another issue in current implementation:
url for "Directory" files have form like "urn:software-category" not
"file:///" (see "miner_applications_process_file_cb" in
tracker-miner-applications.c). So we should change the uri format
before searching in hash table.   

I suggest you to have a look at nie:url, which is meant to have
application readable URIs.

If you think the nie:url should be application readable URIs, there might have bugs in current 
implementation. 
Here are the code and dbus logs: 

static gboolean
miner_applications_process_file_cb (gpointer user_data)
{
            sparql = data->sparql;
        
        if (name && g_ascii_strcasecmp (type, "Directory") == 0) {
                gchar *canonical_uri = tracker_uri_printf_escaped (SOFTWARE_CATEGORY_URN_PREFIX "%s", path);
                uri = canonical_uri;

        } else if (name && g_ascii_strcasecmp (type, "Application") == 0) {
                uri = g_file_get_uri (data->file);
                
        } else if (name && g_str_has_suffix (type, "Applet")) {
                /* The URI of the InformationElement should be a UUID URN */
                uri = g_file_get_uri (data->file);
        }

        if (sparql && uri) {
                
                /* The URL of the DataObject */
                tracker_sparql_builder_predicate (sparql, "nie:url");
                tracker_sparql_builder_object_string (sparql, uri);
        }
}

dbus log:
method call sender=:1.41 -> dest=org.freedesktop.Tracker1 serial=148 
path=/org/freedesktop/Tracker1/Resources; interface=org.freedesktop.Tracker1.Resources; 
member=BatchSparqlUpdate
   string "DROP GRAPH <file:///usr/share/desktop-directories/Utility.directory> INSERT INTO 
<urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory> {
<urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory> a nfo:SoftwareCategory .
<urn:theme-icon:applications-accessories> a nfo:Image .
<urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory> nfo:softwareCategoryIcon 
<urn:theme-icon:applications-accessories> ;
         a nfo:FileDataObject , nie:DataObject ;
         nie:title "Accessories" .
<urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory> nie:dataSource 
<urn:nepomuk:datasource:84f20000-1241-11de-8c30-0800200c9a66> ;
         nfo:fileName "Utility.directory" .
<file:///usr/share/desktop-directories/Utility.directory> a nfo:FileDataObject , nie:DataObject ;
         nie:url "urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory" .
<urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory> nie:isStoredAs 
<file:///usr/share/desktop-directories/Utility.directory> ;
         nfo:fileLastModified "2010-03-15T03:18:49Z" .
}

You can find uri for Direcory is canonical_uri from the code. And in dbus log, it is  
nie:url "urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory" .

And for query "SELECT nie:url(?f) WHERE {?f a nie:DataObject}", it returns two entries: one is an empty 
string and another is urn:software-category:%2Fusr%2Fshare%2Fdesktop-directories%2FUtility.directory
Please have a try.

Thanks!
-Zhenqiang



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]