Re: [Tracker] Merging Turtle branch to Trunk



On Tue, 2008-12-09 at 18:32 +0100, Philip Van Hoof wrote:
Extra note

This one is in the patch. I'm posting it a second time to catch
Garnacho's full attention:

That's conceptually a leak of the hashtable value if the module is
overwriting several times the same field, right, but I think that's more
a problem on the modules side if something like that happens. 

I was just unclear on the semantics if something like this happens, but
just replacing the old value as you do is fine IMHO, so all fine :)


Index: src/tracker-indexer/tracker-module-metadata.c
===================================================================
--- src/tracker-indexer/tracker-module-metadata.c     (revision 2683)
+++ src/tracker-indexer/tracker-module-metadata.c     (working copy)
@@ -188,13 +188,15 @@
              list = g_list_prepend (list, value);
              data = list;
      } else {
-             /* FIXME: warn if data already exists */
+             data = g_hash_table_lookup (metadata->table, field);
+             g_free (data);


## If you don't do this then inserting the same field twice will 
## leak the first

              data = value;
      }
 
-     g_hash_table_insert (metadata->table,
-                          g_object_ref (field),
-                          data);
+     g_hash_table_replace (metadata->table,
+                           g_object_ref (field),
+                           data);
+

## If you don't use _replace here then the GHashTable wont exec the
## GDestroyNotify. This means that you always leak the reference
## per each time that you add a string to a list-element.


      return TRUE;
 }

On Tue, 2008-12-09 at 18:17 +0100, Philip Van Hoof wrote:
Hi there,

This is a new version of the patch.

Garnacho: You want to review the part where I'm changing your precious 
src/tracker-indexer/tracker-module-metadata.c at line 1016 in this diff.

Ivan: Continue committing in the "turtle" branch. I'll cope by merging
your changes into the next patch and using some SVN command-line tricks.

Martyn: Re-review ;-). I commented inline (see lower)

All you guys: Have fun


1. Please remember in the next patch to svn add the files which are
new  so people can compile with the applied patch.

Done this time

2. Use AM_CONDITIONAL in configure.ac:

+if test x$have_raptor == "xyes"; then
+  AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers])
+fi

Everywhere else does:

   AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")

AM_CONDITIONAL is not the same as getting the define in config.h.
There's also a discussion started about making libraptor mandatory. 

The conclusion of that discussion will influence this configure.ac
magic.

I left this as is (using AC_DEFINE)

3. Do we require a particular version of raptor?

Not at this moment. Also this is affected by the discussion about
libraptor taking place. Not adapting the patch for this yet.

4. Do we need to check S.Enabled here and/or the Volumes table at all?
Also, isn't it more logical to use "M" instead of "F" for the
metadata 

[CUT SQLite stored statements]

This is under Ivan's supervision. This new patch has some updates about
the things that I will in this reply annotate as Ivan's responsibility.

Not everything being done by Ivan is already in this patch yet, though.

Ivan is continuing with fixing the issues in the "turtle" branch. I will
continue merging my merged checkout of trunk with his improvements.

So far these three are the merge commands executed:

svn merge -r 2555:2615 ../branches/turtle/
svn merge -r 2615:2680 ../branches/turtle/
svn merge -r 2680:2682 ../branches/turtle

About this specific issue, I think I saw Ivan commit something about
these queries in the turtle branch. I also noticed some differences in
the original merge patch, and this new one, about these queries.

I'll let Ivan reply if he wants to about this.

5. Is the notation "(out)" the same notation used in gtk-doc now?

+ * @mount_mount: (out): if @path is on a removable device, the mount 
point will
+ * be filled in here

I have removed the annotations for now. The GObject-Introspection's
annotation format has not yet been finalized. I have added a humanly
explained comment about how to deal with @mount_point.

6. This needs fixing in tracker-hal.c, also it doesn't say if it
returns 
TRUE or FALSE, usually that's a good idea:

+ * Returns Whether or not @path is on a known removable device
+ *
+ * Returns: Whether or not @path is on a known removable devic

Fixed in the "Returns" annotation. Same here, I am not yet using the
GObject-Introspection annotation syntax. I think it would be better if
we make one huge patch adding annotations everywhere, than to do it
sporadically API per API.

6. This doesn't need to be cast:

+               udi = (const gchar*) key;

Fixed


7. Indentation here is wrong and we could use g_strcmp0() instead so
we 
don't have to check mp for NULL too:

+               if (mp && strcmp (mp, path) != 0) {
+                 if (g_strrstr (path, mp)) {


Done


8. This looks like a memory leak, volume is not freed:

                        if (available)
                                *available = libhal_volume_is_mounted
(volume);
                        break;

It's not. Volume is freed. Double checked this. Maybe the original patch
didn't have it. This now one, in any case, has it.

+ libhal_volume_free (volume);


10. The changes to tracker_data_metadata_insert() actually look like 
they fix a potential leak in TRUNK, i.e. inserting the same thing
twice. Nice catch.

Sure. Carefully review all those, though. I went over all users of the
APIs to make sure they are still all validly using it. But feel free to
add some extra eyes to that process.

11. This looks like it would cause a crasher now that we don't own all
data sent to libtracker-data/tracker-data-metadata.c, I don't think we
should be freeing the list there or its contents:

-       g_return_if_fail (TRACKER_IS_FIELD (field));
+       if (!field) {
+               g_warning ("Field name '%s' has isn't described in the
ontology", 
field_name);
+               g_list_foreach (list, (GFunc) g_free, NULL);
+               g_list_free (list);
+               return;
+       }
+

This was indeed different in the Turtle branch and has indeed been
refactored in Trunk. I have merge-rewritten these functions to act the
way they are expected to act as Trunk got refactored to.

12. Same as #11 for tracker_data_metadata_append_to_list() where we
free "value".

Agreed, fixed too.


13. Just wondering about the name here, shouldn't it be 
GetMetadataBackup? I mean, the Turtle file is just a metadata backup 
right, I don't quite know how the "Embedded" part comes into it.

+       result_set = tracker_data_manager_exec_proc (iface,
+
"GetEmbeddedMetadataBackup",
+                                                    NULL);

This is under supervision of Ivan. I don't think it is fixed in this new
version of the patch yet. I'll let Ivan reply and handle this issue.

14. Please put _ALL_ structures and typedefs at the top of the file:

+/* TODO: URI branch path -> uri */
+
+typedef struct {
+       TrackerService *service;
+       guint32 iid_value;
+       TrackerLanguage *language;
+       TrackerConfig *config;
+} ForeachInMetadataInfo;
+

Fixed

15. Please put all static variables at the top of the file:

+ static TrackerConfig *config = NULL;
+ static TrackerLanguage *language = NULL;

Fixed


16. When linning up variables don't add extra spaces:

+void
+tracker_data_delete_service (const gchar         *path,
+                            const gchar         *rdf_type)
+{
+       TrackerService      *service;
+       const gchar         *service_type;
+       guint32              service_id;

Fixed

17. I wonder if we should recursively delete here even if there is no 
top level service ID found - in case there is ever something below it 
left in the database. Also remember, we changed this at the code camp 
in Helsinki, I am still going through the patch to see if this is a 
conflict at all:

+       if (service_id != 0) {
+               tracker_data_update_delete_service (service, service_id);
+               tracker_data_update_delete_service_recursively (service, (gchar *) path);
+               tracker_data_update_delete_all_metadata (service, service_id);
+       }

Carlos replied that this is logically correct


18. Can we format these arguments according to the coding style
please:

+set_metadata (TrackerField *field, gpointer value, 
ForeachInMetadataInfo *info)

Fixed

19. Every time I look at this, I think it is crack, that's why I added
the comment. I know we do this in the indexer too, but I wonder if we 
should change this to something sensible with a reason too:

+       /* Throttle indexer, value 9 is from older code, why 9? */
+       throttle = tracker_config_get_throttle (info->config);
+       if (throttle > 9) {
+               tracker_throttle (info->config, throttle * 100);
+       }

Sure, if the merge happens before the fix then thw two locations.

I think we should keep the logic in this patch because what is
functionally being added is basically the exact same. Except that it's
coming from the Turtle file instead of from the FS scanner.

I try to keep patches about functionality A separated from patches that
fixes obscure weird hard coded B things.

If trunk fixes this before this merge takes place, then let me know and
I'll adapt this one to the fixed situation too. Of course.

If it gets fixed after merge, we just fix the two locations. It's not
that this is a flood of code here ;-) (just search for "why 9?").

20. I would have used for() instead of while() for clarity here:

+               GList *list;
+
+               list = value;
+
+               while (list) {
+                       set_metadata (field, list->data, user_data);
+                       list = list->next;
+               }

i.e.

   for (list = value; list; list = list->next)
          set_metadata (field, list->data, user_data);

Fixed


21. I think the API should be a bit more uniform here, i.e. 
tracker_data_turtle_import_{start|stop}:

+ void     tracker_data_start_turtle_import               (void);
+ void     tracker_data_stop_turtle_import                (void);


Fixed


22. Instead of duplicating the config and language modules, can we not
just call an initialize and termination function to do all this and
wrap  all static variables in a GStaticPrivate? To save memory and
also if  this is done more than once per invocation of the application
it really  makes sense. Right now I think TrackerConfig and
TrackerLanguage are  only ever created once per application. I would
like to keep it that way  if possible.

+void
+tracker_data_start_turtle_import (void)
+{
+       if (!config)
+               config = tracker_config_new ();
+       if (!language)
+               language = tracker_language_new (config);
+}

Then please make TrackerConfig and TrackerLanguage singleton classes.

23. All public functions we need variable checking, e.g.

   tracker_data_delete_service()
   tracker_data_replace_service()

need (for example):

   g_return_if_fail (path != NULL);

Fixed

24. We should use g_debug here not g_print, and the preprocessor 
comments should be on column 0 at the start of the line.

+               #ifdef TURTLE_DEBUG
+               g_print ("Q ERROR: %s\n", error->message);
+               #endif /* TURTLE_DEBUG */
+               g_error_free (error);

Removed/Fixed/Adapted/Stuffed

25. Indentation:

+               GValue         id_value = { 0, };
+               GValue         is_value = { 0, };
+               gint           iid_value, iis_value;

+               guint32        id;


Fixed

26. API wise, what is the difference between

   tracker_data_delete_service()

and

   tracker_data_update_delete_service()

both look like they should do the same thing except one is the uniform
API which JÃrg worked on. That should be clearer I think. The same
namespace should be used for the _replace_ variant too.

The namespace has been fixed ('update' is added to it)

The difference is that the new one removes using the path or (soon when
the path->URI work is done) using the URI. The first removes using the
service-id. This service-id is outside of the scope of the Turtle code,
therefore is the functionality moved into the database specific pieces
and is the Turtle code just calling it that way.


27. We need spaces in here:

+               info->config = config?g_object_ref (config):NULL;
+               info->language = language?g_object_ref
(language):NULL;

Fixed


28. Bloody good catch guys :)

-       if (file_iface)
+       if (file_iface) {
                g_object_unref (file_iface);
-       if (email_iface)
+               file_iface = NULL;
+       }
+
+       if (email_iface) {
                g_object_unref (email_iface);
-       if (xesam_iface)
+               email_iface = NULL;
+       }
+
+       if (xesam_iface) {
                g_object_unref (xesam_iface);
+               xesam_iface = NULL;
+       }

Np


29. More spaces we don't need :)

+void
+tracker_indexer_commit_transaction (TrackerIndexer *indexer)
+{
+       stop_transaction (indexer);
+       tracker_indexer_set_running (indexer, TRUE);
+
+}
+
+void
+tracker_indexer_open_transaction   (TrackerIndexer *indexer)
+{
+       tracker_indexer_set_running (indexer, FALSE);
+       start_transaction (indexer);
+}
+

Fixed


30. I would change tracker_indexer_commit_transaction() and 
tracker_indexer_open_transaction() to be
tracker_indexer_transaction_*() 
instead.

Fixed


31. There is no need to set mount_point to NULL after freeing it if it
is unused again.

Fixed


32. Shouldn't we do something in an "else" for the 
tracker_hal_path_is_on_removable_device() for "path", like we do for 
"other_path"?

+       if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+                                                    path,
+                                                    &mount_point,
+                                                    NULL)) {
+               if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+                                                    other_path,
+                                                    NULL,
+                                                    NULL)) {
+                       tracker_removable_device_add_move (indexer,
+                                                          mount_point,
+                                                          path,
+                                                          other_path,
+                                                          tracker_service_get_name (service));
+               } else {
+                       tracker_removable_device_add_removal (indexer,
+                                                             mount_point,
+                                                             path,
+                                                             tracker_service_get_name (service));
+               }
+       }

Nope, the logic of that code is good


33. APIs like this really confuse me:

   tracker_removable_device_add_move()
   tracker_removable_device_add_removal()

Can we come up with something a bit more intuitive?


Nope. It means "add a record about a move", "add a record about a
remove".

It's not the same as updating your database as the Turtle file is an
always-append format. You append "that you removed something" and "that
you moved something". You don't go into the file and remove something,
nor do you go into the file and move something. You append that piece of
info, in the form of an "event", to the file. That's the format.

That's why it's called that way.

34. We don't need to allocate memory here:

+       values[0] = g_strdup (triple->object);
+       values[1] = NULL;
...
+
+       g_free (values[0]);


Fixed

35. All request IDs are generated before anything else. This is for 
debugging purposes and so we know how many requests we have _REALLY_ 
received regardless of if there was some parameter check failure.
This 
needs to be before any tracker_dbus_async_return_if_fail() calls:

+       request_id = tracker_dbus_get_next_request_id ();


Fixed

36. This normally comes up as a warning depending on compiler flags, 
"(void)" is needed:

+static gchar *
+get_turtle_userdata_backup_filename ()

Me fixed for Ivan


37. Checking private is NULL or not is pointless here because it is 
created in main() and never freed - I just noticed :) so I should fix 
that. Also, we should store this value in private, since we often get 
"finished" more than once from the indexer and recreating the string 
here would be unnecessary:

+       if (private) {
+               return g_build_filename (private->user_data_dir,
+                                        "tracker-userdata-backup.ttl",
+                                        NULL);
+       } else {
+               g_critical ("Directories not initialized");
+               return NULL;
+       }

Under supervision of Ivan.

38. This should be fixed now (sending finished twice at the same time). 
But also you should know that "finished" is sent whenever the indexer 
finishes indexing ANY content from ANY file changes on the disk. So this 
solution is broken anyway. It will need fixing:

+       if (counter >= 2) {

Under supervision of Ivan.


39. I would rather we are a bit more descriptive for variable names, 
which callback_id this is, isn't obvious without searching further in 
the code, this is in main() for trackerd:

+       gulong                      callback_id;

Under supervision of Ivan.

40. More spacing issues:

        if (force_reindex) {
+
+               gchar              *turtle_file;
+
+               turtle_file = get_turtle_userdata_backup_filename ();

Under supervision of Ivan.


41. Eeeek, what is this for? You are duplicating a lot of stuff here! 
The tracker_db_manager_init() is called just after this. The 
tracker_db_index_manager_init() is also called after this. The 
tracker_data_manager_init() is also called after this. As I understand 
it you just want to save the turtle file after an index right - 
specifically the initial index? If so, then why not just check the 
"initial index" state using tracker_status_get_is_first_time_index(). 
You could also use this in the "finished" callback.

This code looks to me like it would break a lot of things and it 
shouldn't get committed:

        if (force_reindex) {
+
+               gchar              *turtle_file;
+
+               turtle_file = get_turtle_userdata_backup_filename ();
+
+               g_message ("Saving metadata in %s", turtle_file);
+
+               /* Init the DB stack */
+               tracker_db_manager_init (0, &is_first_time_index, TRUE);
+
+               tracker_db_index_manager_init (0,
+                                              tracker_config_get_min_bucket_count (config),
+                                              tracker_config_get_max_bucket_count (config));
+               
+               file_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_FILE);
+               email_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_EMAIL);
+               
+               tracker_data_manager_init (config, language, file_index, email_index);
+               
+               tracker_backup_save (turtle_file);
+
+               /* Shutdown the DB stack */
+               tracker_data_manager_shutdown ();
+               
+               tracker_db_index_manager_shutdown ();
+               tracker_db_manager_shutdown ();
+
+               g_free (turtle_file);
+

Under supervision of Ivan.


42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that 
is rarely the case. Although I admit I am not entirely sure what 
org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from 
the API name. Does it write the Turtle file to the DB? If so, then why 
would you want to write metadata to the database AFTER it has just been 
written to with all the latest information from the indexer? If your 
intention (on the other hand) is to write initial data to the database 
on a reindex before the indexer has had a chance to index the real data 
it gets, then I think this is going about it the wrong way.

+       if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) {
+               g_debug ("Setting callback for crawling finish detection");
+               callback_id = g_signal_connect (private->processor, "finished",
+                                               G_CALLBACK (crawling_finished_cb),
+                                               &callback_id);
+       }
+

Under supervision of Ivan.


43. About tracker-turtle.h:

a) Why is MetadataItem public, it looks unused in the header.
b) Why doesn't it use a proper namespace?
    i.e. TrackerTurtleMetadataItem
c) Why don't the callback typedefs have Tracker in the namespace?
    i.e. TrackerTurtleTripleCallback
d) What happened to the coding style here?
e) Include order is incorrect.
f) Includes should have config.h
g) G_BEGIN_DECLS and G_END_DECLS are needed.
h) Why not use Glib types here? gint, gchar, etc.
i) When I saw "stmt" in the code, I had to find out what it was, where
    it came from, etc. Namespacing issues there again need sortingout.
j) I wouldn't include Jamie's name in the copyright since he didn't 
write it.

All fixed


44. About tracker-turtle.c:

a) The "tracker-turtle.h" include should come after all external
includes.
b) There is no need to include the same files already included in the 
tracker-turtle.h.
c) In tracker_turtle_open(), you just print a critical if not 
initialised. I would suggest you use g_return_val_if_fail () instead
since.
d) I would use g_fopen() not fopen() for compatibility across
platforms.
e) Is turtle->uri a uri or a path? Variable names should reflect this 
accurately:
    i.e. turtle->uri = raptor_new_uri ("/");

raptor_new_uri converts the string to a URI, so it's a URI and that
makes the variable name crystal clear (uri is a URI, indeed).

f) Can we use proper variable names like "count" instead of 3 letter 
variable names that look awefully like another English word I dare
not 
utter :)
    i.e. guint cnt;
g) There are some serious code style issues there.
h) I wouldn't include Jamie's name in the copyright since he didn't 
write it.

All fixed

45. About tracker-removable-device.h:

a) Include order is incorrect.
b) G_BEGIN_DECLS and G_END_DECLS are needed.
c) I wouldn't include Jamie's name in the copyright since he didn't 
write it.

All fixed

d) tracker_removable_device_add_move() shouldn't this be 
tracker_removable_device_move()?

No, you are not moving a removable device. You are adding info about the
moving of a file that is located on a removable device to the metadata
of the removable device. Your API proposal would make it look like as if
you re moving the removable device.

e) tracker_removable_device_add_removal() shouldn't this be 
tracker_removable_device_removal()?

Same. It has nothing to do with removing the removable device. This has
to do with adding the removal of a file on the removable device to the
metadata of the removable device.


46. About tracker-removable-device.c:

a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect) 
should go right to the end of the file since it is so high up, but it 
doesn't. 

Uh? Reread the file. It's not because the ifdef is high-up that this
means that the entire file is being shadowed away with the ifdef.

I think we should decide if we are making the whole file 
require that define or just parts of it and fix it accordingly.

Sure, this is being discussed on the ML atm. Leaving as-is.

b) Do you really need all those includes or is that a copy and paste 
from tracker-indexer.c :)

Fixed. Seems it was indeed a C/P from tracker-indexer.c.

c) Again all typedefs need proper namespacing including the Tracker
name 
otherwise it is confusing as hell when you see it somewhere.
d) Coding style needs fixing.

All fixed






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]