Re: [Tracker] Refactoring the filesystem miner



Hey hey,

On Thu, 2011-09-15 at 15:47 +0200, Carlos Garnacho wrote:
Hey,

On Wed, 2011-09-14 at 15:15 +0100, Martyn Russell wrote:
On 14/09/11 12:04, Carlos Garnacho wrote:
Hey hey,

Hi Carlos,

Lately I've been thinking on how to improve TrackerMinerFS design and
performance, as it's a big piece of code that's getting too intricate at

Me too, I will come on to my thoughts later in this email.

places. It mainly has 2 roles that we should separate further:

   * Keeping track of what files to index (either fed through the crawler
or the dir monitors)
   * actually indexing them

For each of these 2 roles TrackerMinerFS maintains one cache (mtimes for
the first, URNs for the second) that's filled in per-directory as
processing goes, which introduces a latency directly related to how
scattered is the data in the FS.

So, my idea to improve these situations is to separate the first role
out to a separate object that is able to carry out caching operations at
a higher level than folders (probably for entire configured
directories), and would hide the crawler and the monitor to the miner.
That way the miner would query in one go what now does in scattered
chunks. Very rough testing seemed to show crawling is reduced to 30%-40%
of the original time, just ~2x the effort of only adding the directory
monitors.

Just to tell in the ML, this is now implemented in the miner-fs-refactor
branch, timings for mtime checks have improved quite a lot there,
whereas real indexing performance stays roughly the same, as expected.

  Intel i5, SSD disk.
  indexing 468 folders, 6609 files

                 master     miner-fs-refactor
  First index    16.65s     15.4s
  Mtime checks   2.36s      0.99s


  Amd athlon, 5400rpm disk
  indexing 625 folders, 10959 files

                 master     miner-fs-refactor
  First index    178.2s     176.65s
  Mtime checks   12.39s     5.35s

Cheers,
  Carlos





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]