[Tracker] hierarchical-indexing branch review



Hi all,

Carlos recently worked on the hierarchical-indexing branch to ensure parent directories are added to the database BEFORE files or child directories. This avoids additional SPARQL calls and caching that would otherwise be required when processing the crawled directories. So essentially this should improve indexing speed but it also fixes file duplicates which we have seen in some cases. See the bug:

  https://bugzilla.gnome.org/show_bug.cgi?id=610360

This is the review of that branch:

  http://git.gnome.org/browse/tracker/log/?h=hierarchical-indexing

--

1. The ::finished signal has now changed, the following are broken and have not been updated yet:

  - examples/
  - libtracker-miner/tracker-miner-manager
  - tracker-control/tracker-control


2. I would use a GSList in DirectoryProcessingData instead of a GList.


3. I would have named the signal ::crawled-directory since it is more consistent with the other signal naming.


4. No need for g_slice_new0() in enumerator_data_new(), g_slice_new() is good enough there.


5. No need for the cast in file_enumerate_next_cb() here:

+ ed = (EnumeratorData*) user_data;


6. Doesn't it make sense to close the enumeration before processing more data with enumerator_process_data()? Not sure here really since it is all async anyway.


7. In file_enumerator_next_cb() I would set parent AFTER the if (error) ... since it may not ever be used in the case of an error.


8. Probably doesn't matter so much, but item_queue_get_next_file() should return a typedef enum not a gint which happens to be an enum anyway.


9. Can we change the #if 0 in crawler_directory_crawled_cb() to #ifdef ENABLE_TREE_DEBUGGING or something meaningful? Unless we remove it?


10. Doesn't G_POST_ORDER make more sense when freeing the GNode tree in crawled_directory_data_free()?


11. Shouldn't item_query_exists() use a SPARQL function variable for performance on the OPTIONAL there?

--

Other than that, a really good bit of work Carlos, thanks.

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]