Re: [Tracker] Merging Turtle branch to Trunk



On 08/12/08 12:37, Philip Van Hoof wrote:
Ping

On Thu, 2008-12-04 at 14:16 +0100, Philip Van Hoof wrote:
This is the latest merge-patch for Turtle to Trunk.

Please review



Hi,

OK, so here are my review comments. Note, this patch doesn't compile for me, there are a lot of warnings/errors.

I also want to know if we should be _requiring_ Turtle or just having it as something optional? Because the patch seems to switch between both.



1. Please remember in the next patch to svn add the files which are new so people can compile with the applied patch.


2. Use AM_CONDITIONAL in configure.ac:

+if test x$have_raptor == "xyes"; then
+  AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers])
+fi

Everywhere else does:

  AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")


3. Do we require a particular version of raptor?

4. Do we need to check S.Enabled here and/or the Volumes table at all?
Also, isn't it more logical to use "M" instead of "F" for the metadata table alias for:

+GetEmbeddedMetadataBackup SELECT S.Path || '/' || S.Name, T.TypeName, F.MetadataID, F.MetadataDisplay From Services S, ServiceMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND (S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From MetadataTypes WHERE Embedded = 0)) UNION SELECT S.Path || '/' || S.Name, T.TypeName, F.MetadataID, upper(F.MetadataValue) From Services S, ServiceNumericMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND (S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From MetadataTypes WHERE Embedded = 0)) UNION SELECT S.Path || '/' || S.Name, T.Typename, F.MetadataID, F.MetadataValue From Services S, ServiceKeywordMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND (S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From MetadataTypes WHERE Embedded = 0));


5. Is the notation "(out)" the same notation used in gtk-doc now?

+ * @mount_mount: (out): if @path is on a removable device, the mount point will
+ * be filled in here

6. This needs fixing in tracker-hal.c, also it doesn't say if it returns TRUE or FALSE, usually that's a good idea:

+ * Returns Whether or not @path is on a known removable device
+ *
+ * Returns: Whether or not @path is on a known removable devic


6. This doesn't need to be cast:

+               udi = (const gchar*) key;


7. Indentation here is wrong and we could use g_strcmp0() instead so we don't have to check mp for NULL too:

+               if (mp && strcmp (mp, path) != 0) {
+                 if (g_strrstr (path, mp)) {


8. This looks like a memory leak, volume is not freed:

                        if (available)
                                *available = libhal_volume_is_mounted (volume);
                        break;


9. Please don't add blank lines. You can omit these from diff files too by changing your ~/.subversion/config and adding:

  diff-cmd = /home/pvanhoof/bin/svndiff

and with the svndiff content being:

  /usr/bin/diff -upBb --label "${3}" ${6} --label "${5}" ${7}

to not diff whitespace changes like this:

+       g_hash_table_replace (metadata->table,
                             g_object_ref (field),
                             g_strdup (value));
+
 }


10. The changes to tracker_data_metadata_insert() actually look like they fix a potential leak in TRUNK, i.e. inserting the same thing twice. Nice catch.


11. This looks like it would cause a crasher now that we don't own all data sent to libtracker-data/tracker-data-metadata.c, I don't think we should be freeing the list there or its contents:

-       g_return_if_fail (TRACKER_IS_FIELD (field));
+       if (!field) {
+ g_warning ("Field name '%s' has isn't described in the ontology", field_name);
+               g_list_foreach (list, (GFunc) g_free, NULL);
+               g_list_free (list);
+               return;
+       }
+


12. Same as #11 for tracker_data_metadata_append_to_list() where we free "value".



13. Just wondering about the name here, shouldn't it be GetMetadataBackup? I mean, the Turtle file is just a metadata backup right, I don't quite know how the "Embedded" part comes into it.

+       result_set = tracker_data_manager_exec_proc (iface,
+                                                    "GetEmbeddedMetadataBackup",
+                                                    NULL);


14. Please put _ALL_ structures and typedefs at the top of the file:

+/* TODO: URI branch path -> uri */
+
+typedef struct {
+       TrackerService *service;
+       guint32 iid_value;
+       TrackerLanguage *language;
+       TrackerConfig *config;
+} ForeachInMetadataInfo;
+


15. Please put all static variables at the top of the file:

+ static TrackerConfig *config = NULL;
+ static TrackerLanguage *language = NULL;



16. When linning up variables don't add extra spaces:

+void
+tracker_data_delete_service (const gchar         *path,
+                            const gchar         *rdf_type)
+{
+       TrackerService      *service;
+       const gchar         *service_type;
+       guint32              service_id;


17. I wonder if we should recursively delete here even if there is no top level service ID found - in case there is ever something below it left in the database. Also remember, we changed this at the code camp in Helsinki, I am still going through the patch to see if this is a conflict at all:

+       if (service_id != 0) {
+               tracker_data_update_delete_service (service, service_id);
+               tracker_data_update_delete_service_recursively (service, (gchar *) path);
+               tracker_data_update_delete_all_metadata (service, service_id);
+       }


18. Can we format these arguments according to the coding style please:

+set_metadata (TrackerField *field, gpointer value, ForeachInMetadataInfo *info)


19. Every time I look at this, I think it is crack, that's why I added the comment. I know we do this in the indexer too, but I wonder if we should change this to something sensible with a reason too:

+       /* Throttle indexer, value 9 is from older code, why 9? */
+       throttle = tracker_config_get_throttle (info->config);
+       if (throttle > 9) {
+               tracker_throttle (info->config, throttle * 100);
+       }


20. I would have used for() instead of while() for clarity here:

+               GList *list;
+
+               list = value;
+
+               while (list) {
+                       set_metadata (field, list->data, user_data);
+                       list = list->next;
+               }

i.e.

  for (list = value; list; list = list->next)
         set_metadata (field, list->data, user_data);


21. I think the API should be a bit more uniform here, i.e. tracker_data_turtle_import_{start|stop}:

+ void     tracker_data_start_turtle_import               (void);
+ void     tracker_data_stop_turtle_import                (void);


22. Instead of duplicating the config and language modules, can we not just call an initialize and termination function to do all this and wrap all static variables in a GStaticPrivate? To save memory and also if this is done more than once per invocation of the application it really makes sense. Right now I think TrackerConfig and TrackerLanguage are only ever created once per application. I would like to keep it that way if possible.

+void
+tracker_data_start_turtle_import (void)
+{
+       if (!config)
+               config = tracker_config_new ();
+       if (!language)
+               language = tracker_language_new (config);
+}


23. All public functions we need variable checking, e.g.

  tracker_data_delete_service()
  tracker_data_replace_service()

need (for example):

  g_return_if_fail (path != NULL);


24. We should use g_debug here not g_print, and the preprocessor comments should be on column 0 at the start of the line.

+               #ifdef TURTLE_DEBUG
+               g_print ("Q ERROR: %s\n", error->message);
+               #endif /* TURTLE_DEBUG */
+               g_error_free (error);


25. Indentation:

+               GValue         id_value = { 0, };
+               GValue         is_value = { 0, };
+               gint           iid_value, iis_value;

+               guint32        id;


26. API wise, what is the difference between

  tracker_data_delete_service()

and

  tracker_data_update_delete_service()

both look like they should do the same thing except one is the uniform API which Jürg worked on. That should be clearer I think. The same namespace should be used for the _replace_ variant too.


27. We need spaces in here:

+               info->config = config?g_object_ref (config):NULL;
+               info->language = language?g_object_ref (language):NULL;


28. Bloody good catch guys :)

-       if (file_iface)
+       if (file_iface) {
                g_object_unref (file_iface);
-       if (email_iface)
+               file_iface = NULL;
+       }
+
+       if (email_iface) {
                g_object_unref (email_iface);
-       if (xesam_iface)
+               email_iface = NULL;
+       }
+
+       if (xesam_iface) {
                g_object_unref (xesam_iface);
+               xesam_iface = NULL;
+       }


29. More spaces we don't need :)

+void
+tracker_indexer_commit_transaction (TrackerIndexer *indexer)
+{
+       stop_transaction (indexer);
+       tracker_indexer_set_running (indexer, TRUE);
+
+}
+
+void
+tracker_indexer_open_transaction   (TrackerIndexer *indexer)
+{
+       tracker_indexer_set_running (indexer, FALSE);
+       start_transaction (indexer);
+}
+


30. I would change tracker_indexer_commit_transaction() and tracker_indexer_open_transaction() to be tracker_indexer_transaction_*() instead.


31. There is no need to set mount_point to NULL after freeing it if it is unused again.


32. Shouldn't we do something in an "else" for the tracker_hal_path_is_on_removable_device() for "path", like we do for "other_path"?

+       if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+                                                    path,
+                                                    &mount_point,
+                                                    NULL)) {
+               if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+                                                    other_path,
+                                                    NULL,
+                                                    NULL)) {
+                       tracker_removable_device_add_move (indexer,
+                                                          mount_point,
+                                                          path,
+                                                          other_path,
+                                                          tracker_service_get_name (service));
+               } else {
+                       tracker_removable_device_add_removal (indexer,
+                                                             mount_point,
+                                                             path,
+                                                             tracker_service_get_name (service));
+               }
+       }

33. APIs like this really confuse me:

  tracker_removable_device_add_move()
  tracker_removable_device_add_removal()

Can we come up with something a bit more intuitive?


34. We don't need to allocate memory here:

+       values[0] = g_strdup (triple->object);
+       values[1] = NULL;
...
+
+       g_free (values[0]);


35. All request IDs are generated before anything else. This is for debugging purposes and so we know how many requests we have _REALLY_ received regardless of if there was some parameter check failure. This needs to be before any tracker_dbus_async_return_if_fail() calls:

+       request_id = tracker_dbus_get_next_request_id ();


36. This normally comes up as a warning depending on compiler flags, "(void)" is needed:

+static gchar *
+get_turtle_userdata_backup_filename ()


37. Checking private is NULL or not is pointless here because it is created in main() and never freed - I just noticed :) so I should fix that. Also, we should store this value in private, since we often get "finished" more than once from the indexer and recreating the string here would be unnecessary:

+       if (private) {
+               return g_build_filename (private->user_data_dir,
+                                        "tracker-userdata-backup.ttl",
+                                        NULL);
+       } else {
+               g_critical ("Directories not initialized");
+               return NULL;
+       }


38. This should be fixed now (sending finished twice at the same time). But also you should know that "finished" is sent whenever the indexer finishes indexing ANY content from ANY file changes on the disk. So this solution is broken anyway. It will need fixing:

+       if (counter >= 2) {


39. I would rather we are a bit more descriptive for variable names, which callback_id this is, isn't obvious without searching further in the code, this is in main() for trackerd:

+       gulong                      callback_id;


40. More spacing issues:

        if (force_reindex) {
+
+               gchar              *turtle_file;
+
+               turtle_file = get_turtle_userdata_backup_filename ();


41. Eeeek, what is this for? You are duplicating a lot of stuff here! The tracker_db_manager_init() is called just after this. The tracker_db_index_manager_init() is also called after this. The tracker_data_manager_init() is also called after this. As I understand it you just want to save the turtle file after an index right - specifically the initial index? If so, then why not just check the "initial index" state using tracker_status_get_is_first_time_index(). You could also use this in the "finished" callback.

This code looks to me like it would break a lot of things and it shouldn't get committed:

        if (force_reindex) {
+
+               gchar              *turtle_file;
+
+               turtle_file = get_turtle_userdata_backup_filename ();
+
+               g_message ("Saving metadata in %s", turtle_file);
+
+               /* Init the DB stack */
+               tracker_db_manager_init (0, &is_first_time_index, TRUE);
+
+               tracker_db_index_manager_init (0,
+                                              tracker_config_get_min_bucket_count (config),
+                                              tracker_config_get_max_bucket_count (config));
+               
+               file_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_FILE);
+ email_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_EMAIL);
+               
+               tracker_data_manager_init (config, language, file_index, email_index);
+               
+               tracker_backup_save (turtle_file);
+
+               /* Shutdown the DB stack */
+               tracker_data_manager_shutdown ();
+               
+               tracker_db_index_manager_shutdown ();
+               tracker_db_manager_shutdown ();
+
+               g_free (turtle_file);
+


42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that is rarely the case. Although I admit I am not entirely sure what org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from the API name. Does it write the Turtle file to the DB? If so, then why would you want to write metadata to the database AFTER it has just been written to with all the latest information from the indexer? If your intention (on the other hand) is to write initial data to the database on a reindex before the indexer has had a chance to index the real data it gets, then I think this is going about it the wrong way.

+       if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) {
+               g_debug ("Setting callback for crawling finish detection");
+               callback_id = g_signal_connect (private->processor, "finished",
+                                               G_CALLBACK (crawling_finished_cb),
+                                               &callback_id);
+       }
+


43. About tracker-turtle.h:

a) Why is MetadataItem public, it looks unused in the header.
b) Why doesn't it use a proper namespace?
   i.e. TrackerTurtleMetadataItem
c) Why don't the callback typedefs have Tracker in the namespace?
   i.e. TrackerTurtleTripleCallback
d) What happened to the coding style here?
e) Include order is incorrect.
f) Includes should have config.h
g) G_BEGIN_DECLS and G_END_DECLS are needed.
h) Why not use Glib types here? gint, gchar, etc.
i) When I saw "stmt" in the code, I had to find out what it was, where
   it came from, etc. Namespacing issues there again need sorting out.
j) I wouldn't include Jamie's name in the copyright since he didn't write it.


44. About tracker-turtle.c:

a) The "tracker-turtle.h" include should come after all external includes.
b) There is no need to include the same files already included in the tracker-turtle.h. c) In tracker_turtle_open(), you just print a critical if not initialised. I would suggest you use g_return_val_if_fail () instead since.
d) I would use g_fopen() not fopen() for compatibility across platforms.
e) Is turtle->uri a uri or a path? Variable names should reflect this accurately:
   i.e. turtle->uri = raptor_new_uri ("/");
f) Can we use proper variable names like "count" instead of 3 letter variable names that look awefully like another English word I dare not utter :)
   i.e. guint cnt;
g) There are some serious code style issues there.
h) I wouldn't include Jamie's name in the copyright since he didn't write it.


45. About tracker-removable-device.h:

a) Include order is incorrect.
b) G_BEGIN_DECLS and G_END_DECLS are needed.
c) I wouldn't include Jamie's name in the copyright since he didn't write it. d) tracker_removable_device_add_move() shouldn't this be tracker_removable_device_move()? e) tracker_removable_device_add_removal() shouldn't this be tracker_removable_device_removal()?


46. About tracker-removable-device.c:

a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect) should go right to the end of the file since it is so high up, but it doesn't. I think we should decide if we are making the whole file require that define or just parts of it and fix it accordingly. b) Do you really need all those includes or is that a copy and paste from tracker-indexer.c :) c) Again all typedefs need proper namespacing including the Tracker name otherwise it is confusing as hell when you see it somewhere.
d) Coding style needs fixing.


Summary:

It looks to me like a lot of work is still needed to keep the code clean and consistent with the current code base. There are a couple of leaks which need fixing and a couple of issues which this patch fixes too. The major concern I have is about when the restore and backup are actually done. Unless I have misunderstood (which is entirely possible) it seems they are done at the wrong times, or at least could be done at better times.

This patch is a good start however. Once you have managed to fix the issues, send me the new patch and I will do a quick review.

I would like to see Carlos also review the parts where the tracker-indexer.c is involved. They look fine to me, but he might have an opinion to share.

Thanks for the hard work guys!

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]