Re: [Tracker] Proposal to improve tracker-miner-fs "up-to-date" check performance



On Mon, 2010-03-29 at 22:44 +0800, Chen, Zhenqiang wrote:


2) Reduce dbus calls and queries:

(1) At the beginning, execute one query to get all the <url, fileLastModified> pairs and put them in a hash 
table.

Problem here is that for people with a huge amount of files, the URL
keys will consume a lot of memory. We have to scale on mobile devices
for some of Tracker's users so his ain't acceptable.

However, you can probably also fetch using SPARQL's OFFSET and LIMIT and
work in chunks of let's say 1000 files.

(2) For each file, lookup the uri in the hash table, 
      if there is, 
          compare the time information of the file with the fileLastModified value from hash table,
          if the values are equal,
              The entry is up-to-date.
              
    Query is only required when it is not in the hash table or time is not match. 

(3) There is another issue in current implementation: 
url for "Directory" files have form like "urn:software-category" not "file:///" (see 
"miner_applications_process_file_cb" in tracker-miner-applications.c). So we should change the uri format 
before searching in hash table. 

URL isn't the same as "the subject", nie:url should be a file:///, if
not then that's wrong indeed. But the subject of the resource can be
"urn:etc etc". Do you mean that nie:url is wrong?

(4) Free the hash table when miner finishes. 

In most cases, there is no or few change. With this improvement, tracker will become much much faster.

Thanks!
-Zhenqiang
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]