Hi there, This is a new version of the patch. Garnacho: You want to review the part where I'm changing your precious src/tracker-indexer/tracker-module-metadata.c at line 1016 in this diff. Ivan: Continue committing in the "turtle" branch. I'll cope by merging your changes into the next patch and using some SVN command-line tricks. Martyn: Re-review ;-). I commented inline (see lower) All you guys: Have fun
1. Please remember in the next patch to svn add the files which are new so people can compile with the applied patch.
Done this time
2. Use AM_CONDITIONAL in configure.ac: +if test x$have_raptor == "xyes"; then + AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers]) +fi Everywhere else does: AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")
AM_CONDITIONAL is not the same as getting the define in config.h. There's also a discussion started about making libraptor mandatory. The conclusion of that discussion will influence this configure.ac magic. I left this as is (using AC_DEFINE)
3. Do we require a particular version of raptor?
Not at this moment. Also this is affected by the discussion about libraptor taking place. Not adapting the patch for this yet.
4. Do we need to check S.Enabled here and/or the Volumes table at all? Also, isn't it more logical to use "M" instead of "F" for the metadata [CUT SQLite stored statements]
This is under Ivan's supervision. This new patch has some updates about the things that I will in this reply annotate as Ivan's responsibility. Not everything being done by Ivan is already in this patch yet, though. Ivan is continuing with fixing the issues in the "turtle" branch. I will continue merging my merged checkout of trunk with his improvements. So far these three are the merge commands executed: svn merge -r 2555:2615 ../branches/turtle/ svn merge -r 2615:2680 ../branches/turtle/ svn merge -r 2680:2682 ../branches/turtle About this specific issue, I think I saw Ivan commit something about these queries in the turtle branch. I also noticed some differences in the original merge patch, and this new one, about these queries. I'll let Ivan reply if he wants to about this.
5. Is the notation "(out)" the same notation used in gtk-doc now? + * @mount_mount: (out): if @path is on a removable device, the mount point will + * be filled in here
I have removed the annotations for now. The GObject-Introspection's annotation format has not yet been finalized. I have added a humanly explained comment about how to deal with @mount_point.
6. This needs fixing in tracker-hal.c, also it doesn't say if it returns TRUE or FALSE, usually that's a good idea:
+ * Returns Whether or not @path is on a known removable device + * + * Returns: Whether or not @path is on a known removable devic
Fixed in the "Returns" annotation. Same here, I am not yet using the GObject-Introspection annotation syntax. I think it would be better if we make one huge patch adding annotations everywhere, than to do it sporadically API per API.
6. This doesn't need to be cast: + udi = (const gchar*) key;
Fixed
7. Indentation here is wrong and we could use g_strcmp0() instead so we don't have to check mp for NULL too: + if (mp && strcmp (mp, path) != 0) { + if (g_strrstr (path, mp)) {
Done
8. This looks like a memory leak, volume is not freed: if (available) *available = libhal_volume_is_mounted (volume); break;
It's not. Volume is freed. Double checked this. Maybe the original patch didn't have it. This now one, in any case, has it. + libhal_volume_free (volume);
10. The changes to tracker_data_metadata_insert() actually look like they fix a potential leak in TRUNK, i.e. inserting the same thing twice. Nice catch.
Sure. Carefully review all those, though. I went over all users of the APIs to make sure they are still all validly using it. But feel free to add some extra eyes to that process.
11. This looks like it would cause a crasher now that we don't own all data sent to libtracker-data/tracker-data-metadata.c, I don't think we should be freeing the list there or its contents: - g_return_if_fail (TRACKER_IS_FIELD (field)); + if (!field) { + g_warning ("Field name '%s' has isn't described in the ontology", field_name); + g_list_foreach (list, (GFunc) g_free, NULL); + g_list_free (list); + return; + } +
This was indeed different in the Turtle branch and has indeed been refactored in Trunk. I have merge-rewritten these functions to act the way they are expected to act as Trunk got refactored to.
12. Same as #11 for tracker_data_metadata_append_to_list() where we free "value".
Agreed, fixed too.
13. Just wondering about the name here, shouldn't it be GetMetadataBackup? I mean, the Turtle file is just a metadata backup right, I don't quite know how the "Embedded" part comes into it. + result_set = tracker_data_manager_exec_proc (iface, + "GetEmbeddedMetadataBackup", + NULL);
This is under supervision of Ivan. I don't think it is fixed in this new version of the patch yet. I'll let Ivan reply and handle this issue.
14. Please put _ALL_ structures and typedefs at the top of the file: +/* TODO: URI branch path -> uri */ + +typedef struct { + TrackerService *service; + guint32 iid_value; + TrackerLanguage *language; + TrackerConfig *config; +} ForeachInMetadataInfo; +
Fixed
15. Please put all static variables at the top of the file: + static TrackerConfig *config = NULL; + static TrackerLanguage *language = NULL;
Fixed
16. When linning up variables don't add extra spaces: +void +tracker_data_delete_service (const gchar *path, + const gchar *rdf_type) +{ + TrackerService *service; + const gchar *service_type; + guint32 service_id;
Fixed
17. I wonder if we should recursively delete here even if there is no top level service ID found - in case there is ever something below it left in the database. Also remember, we changed this at the code camp in Helsinki, I am still going through the patch to see if this is a conflict at all: + if (service_id != 0) { + tracker_data_update_delete_service (service, service_id); + tracker_data_update_delete_service_recursively (service, (gchar *) path); + tracker_data_update_delete_all_metadata (service, service_id); + }
Carlos replied that this is logically correct
18. Can we format these arguments according to the coding style please: +set_metadata (TrackerField *field, gpointer value, ForeachInMetadataInfo *info)
Fixed
19. Every time I look at this, I think it is crack, that's why I added the comment. I know we do this in the indexer too, but I wonder if we should change this to something sensible with a reason too: + /* Throttle indexer, value 9 is from older code, why 9? */ + throttle = tracker_config_get_throttle (info->config); + if (throttle > 9) { + tracker_throttle (info->config, throttle * 100); + }
Sure, if the merge happens before the fix then thw two locations. I think we should keep the logic in this patch because what is functionally being added is basically the exact same. Except that it's coming from the Turtle file instead of from the FS scanner. I try to keep patches about functionality A separated from patches that fixes obscure weird hard coded B things. If trunk fixes this before this merge takes place, then let me know and I'll adapt this one to the fixed situation too. Of course. If it gets fixed after merge, we just fix the two locations. It's not that this is a flood of code here ;-) (just search for "why 9?").
20. I would have used for() instead of while() for clarity here: + GList *list; + + list = value; + + while (list) { + set_metadata (field, list->data, user_data); + list = list->next; + } i.e. for (list = value; list; list = list->next) set_metadata (field, list->data, user_data);
Fixed
21. I think the API should be a bit more uniform here, i.e. tracker_data_turtle_import_{start|stop}: + void tracker_data_start_turtle_import (void); + void tracker_data_stop_turtle_import (void);
Fixed
22. Instead of duplicating the config and language modules, can we not just call an initialize and termination function to do all this and wrap all static variables in a GStaticPrivate? To save memory and also if this is done more than once per invocation of the application it really makes sense. Right now I think TrackerConfig and TrackerLanguage are only ever created once per application. I would like to keep it that way if possible. +void +tracker_data_start_turtle_import (void) +{ + if (!config) + config = tracker_config_new (); + if (!language) + language = tracker_language_new (config); +}
Then please make TrackerConfig and TrackerLanguage singleton classes.
23. All public functions we need variable checking, e.g. tracker_data_delete_service() tracker_data_replace_service() need (for example): g_return_if_fail (path != NULL);
Fixed
24. We should use g_debug here not g_print, and the preprocessor comments should be on column 0 at the start of the line. + #ifdef TURTLE_DEBUG + g_print ("Q ERROR: %s\n", error->message); + #endif /* TURTLE_DEBUG */ + g_error_free (error);
Removed/Fixed/Adapted/Stuffed
25. Indentation: + GValue id_value = { 0, }; + GValue is_value = { 0, }; + gint iid_value, iis_value; + guint32 id;
Fixed
26. API wise, what is the difference between tracker_data_delete_service() and tracker_data_update_delete_service() both look like they should do the same thing except one is the uniform API which JÃrg worked on. That should be clearer I think. The same namespace should be used for the _replace_ variant too.
The namespace has been fixed ('update' is added to it) The difference is that the new one removes using the path or (soon when the path->URI work is done) using the URI. The first removes using the service-id. This service-id is outside of the scope of the Turtle code, therefore is the functionality moved into the database specific pieces and is the Turtle code just calling it that way.
27. We need spaces in here: + info->config = config?g_object_ref (config):NULL; + info->language = language?g_object_ref (language):NULL;
Fixed
28. Bloody good catch guys :) - if (file_iface) + if (file_iface) { g_object_unref (file_iface); - if (email_iface) + file_iface = NULL; + } + + if (email_iface) { g_object_unref (email_iface); - if (xesam_iface) + email_iface = NULL; + } + + if (xesam_iface) { g_object_unref (xesam_iface); + xesam_iface = NULL; + }
Np
29. More spaces we don't need :) +void +tracker_indexer_commit_transaction (TrackerIndexer *indexer) +{ + stop_transaction (indexer); + tracker_indexer_set_running (indexer, TRUE); + +} + +void +tracker_indexer_open_transaction (TrackerIndexer *indexer) +{ + tracker_indexer_set_running (indexer, FALSE); + start_transaction (indexer); +} +
Fixed
30. I would change tracker_indexer_commit_transaction() and tracker_indexer_open_transaction() to be tracker_indexer_transaction_*() instead.
Fixed
31. There is no need to set mount_point to NULL after freeing it if it is unused again.
Fixed
32. Shouldn't we do something in an "else" for the tracker_hal_path_is_on_removable_device() for "path", like we do for "other_path"? + if (tracker_hal_path_is_on_removable_device (indexer->private->hal, + path, + &mount_point, + NULL)) { + if (tracker_hal_path_is_on_removable_device (indexer->private->hal, + other_path, + NULL, + NULL)) { + tracker_removable_device_add_move (indexer, + mount_point, + path, + other_path, + tracker_service_get_name (service)); + } else { + tracker_removable_device_add_removal (indexer, + mount_point, + path, + tracker_service_get_name (service)); + } + }
Nope, the logic of that code is good
33. APIs like this really confuse me: tracker_removable_device_add_move() tracker_removable_device_add_removal() Can we come up with something a bit more intuitive?
Nope. It means "add a record about a move", "add a record about a remove". It's not the same as updating your database as the Turtle file is an always-append format. You append "that you removed something" and "that you moved something". You don't go into the file and remove something, nor do you go into the file and move something. You append that piece of info, in the form of an "event", to the file. That's the format. That's why it's called that way.
34. We don't need to allocate memory here: + values[0] = g_strdup (triple->object); + values[1] = NULL; ... + + g_free (values[0]);
Fixed
35. All request IDs are generated before anything else. This is for debugging purposes and so we know how many requests we have _REALLY_ received regardless of if there was some parameter check failure. This needs to be before any tracker_dbus_async_return_if_fail() calls: + request_id = tracker_dbus_get_next_request_id ();
Fixed
36. This normally comes up as a warning depending on compiler flags, "(void)" is needed: +static gchar * +get_turtle_userdata_backup_filename ()
Me fixed for Ivan
37. Checking private is NULL or not is pointless here because it is created in main() and never freed - I just noticed :) so I should fix that. Also, we should store this value in private, since we often get "finished" more than once from the indexer and recreating the string here would be unnecessary: + if (private) { + return g_build_filename (private->user_data_dir, + "tracker-userdata-backup.ttl", + NULL); + } else { + g_critical ("Directories not initialized"); + return NULL; + }
Under supervision of Ivan.
38. This should be fixed now (sending finished twice at the same time). But also you should know that "finished" is sent whenever the indexer finishes indexing ANY content from ANY file changes on the disk. So this solution is broken anyway. It will need fixing: + if (counter >= 2) {
Under supervision of Ivan.
39. I would rather we are a bit more descriptive for variable names, which callback_id this is, isn't obvious without searching further in the code, this is in main() for trackerd: + gulong callback_id;
Under supervision of Ivan.
40. More spacing issues: if (force_reindex) { + + gchar *turtle_file; + + turtle_file = get_turtle_userdata_backup_filename ();
Under supervision of Ivan.
41. Eeeek, what is this for? You are duplicating a lot of stuff here! The tracker_db_manager_init() is called just after this. The tracker_db_index_manager_init() is also called after this. The tracker_data_manager_init() is also called after this. As I understand it you just want to save the turtle file after an index right - specifically the initial index? If so, then why not just check the "initial index" state using tracker_status_get_is_first_time_index(). You could also use this in the "finished" callback. This code looks to me like it would break a lot of things and it shouldn't get committed: if (force_reindex) { + + gchar *turtle_file; + + turtle_file = get_turtle_userdata_backup_filename (); + + g_message ("Saving metadata in %s", turtle_file); + + /* Init the DB stack */ + tracker_db_manager_init (0, &is_first_time_index, TRUE); + + tracker_db_index_manager_init (0, + tracker_config_get_min_bucket_count (config), + tracker_config_get_max_bucket_count (config)); + + file_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_FILE); + email_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_EMAIL); + + tracker_data_manager_init (config, language, file_index, email_index); + + tracker_backup_save (turtle_file); + + /* Shutdown the DB stack */ + tracker_data_manager_shutdown (); + + tracker_db_index_manager_shutdown (); + tracker_db_manager_shutdown (); + + g_free (turtle_file); +
Under supervision of Ivan.
42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that is rarely the case. Although I admit I am not entirely sure what org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from the API name. Does it write the Turtle file to the DB? If so, then why would you want to write metadata to the database AFTER it has just been written to with all the latest information from the indexer? If your intention (on the other hand) is to write initial data to the database on a reindex before the indexer has had a chance to index the real data it gets, then I think this is going about it the wrong way. + if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) { + g_debug ("Setting callback for crawling finish detection"); + callback_id = g_signal_connect (private->processor, "finished", + G_CALLBACK (crawling_finished_cb), + &callback_id); + } +
Under supervision of Ivan.
43. About tracker-turtle.h: a) Why is MetadataItem public, it looks unused in the header. b) Why doesn't it use a proper namespace? i.e. TrackerTurtleMetadataItem c) Why don't the callback typedefs have Tracker in the namespace? i.e. TrackerTurtleTripleCallback d) What happened to the coding style here? e) Include order is incorrect. f) Includes should have config.h g) G_BEGIN_DECLS and G_END_DECLS are needed. h) Why not use Glib types here? gint, gchar, etc. i) When I saw "stmt" in the code, I had to find out what it was, where it came from, etc. Namespacing issues there again need sortingout. j) I wouldn't include Jamie's name in the copyright since he didn't write it.
All fixed
44. About tracker-turtle.c: a) The "tracker-turtle.h" include should come after all external includes. b) There is no need to include the same files already included in the tracker-turtle.h. c) In tracker_turtle_open(), you just print a critical if not initialised. I would suggest you use g_return_val_if_fail () instead since. d) I would use g_fopen() not fopen() for compatibility across platforms. e) Is turtle->uri a uri or a path? Variable names should reflect this accurately: i.e. turtle->uri = raptor_new_uri ("/");
raptor_new_uri converts the string to a URI, so it's a URI and that makes the variable name crystal clear (uri is a URI, indeed).
f) Can we use proper variable names like "count" instead of 3 letter variable names that look awefully like another English word I dare not utter :) i.e. guint cnt; g) There are some serious code style issues there. h) I wouldn't include Jamie's name in the copyright since he didn't write it.
All fixed
45. About tracker-removable-device.h: a) Include order is incorrect. b) G_BEGIN_DECLS and G_END_DECLS are needed. c) I wouldn't include Jamie's name in the copyright since he didn't write it.
All fixed
d) tracker_removable_device_add_move() shouldn't this be tracker_removable_device_move()?
No, you are not moving a removable device. You are adding info about the moving of a file that is located on a removable device to the metadata of the removable device. Your API proposal would make it look like as if you re moving the removable device.
e) tracker_removable_device_add_removal() shouldn't this be tracker_removable_device_removal()?
Same. It has nothing to do with removing the removable device. This has to do with adding the removal of a file on the removable device to the metadata of the removable device.
46. About tracker-removable-device.c: a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect) should go right to the end of the file since it is so high up, but it doesn't.
Uh? Reread the file. It's not because the ifdef is high-up that this means that the entire file is being shadowed away with the ifdef.
I think we should decide if we are making the whole file require that define or just parts of it and fix it accordingly.
Sure, this is being discussed on the ML atm. Leaving as-is.
b) Do you really need all those includes or is that a copy and paste from tracker-indexer.c :)
Fixed. Seems it was indeed a C/P from tracker-indexer.c.
c) Again all typedefs need proper namespacing including the Tracker name otherwise it is confusing as hell when you see it somewhere. d) Coding style needs fixing.
All fixed -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
merge.diff
Description: Text Data