Re: [Tracker] Merging Turtle branch to Trunk
- From: Martyn Russell <martyn imendio com>
- To: Philip Van Hoof <spam pvanhoof be>
- Cc: Tracker mailing list <tracker-list gnome org>
- Subject: Re: [Tracker] Merging Turtle branch to Trunk
- Date: Tue, 09 Dec 2008 12:24:25 +0000
On 08/12/08 12:37, Philip Van Hoof wrote:
Ping
On Thu, 2008-12-04 at 14:16 +0100, Philip Van Hoof wrote:
This is the latest merge-patch for Turtle to Trunk.
Please review
Hi,
OK, so here are my review comments. Note, this patch doesn't compile for
me, there are a lot of warnings/errors.
I also want to know if we should be _requiring_ Turtle or just having it
as something optional? Because the patch seems to switch between both.
1. Please remember in the next patch to svn add the files which are new
so people can compile with the applied patch.
2. Use AM_CONDITIONAL in configure.ac:
+if test x$have_raptor == "xyes"; then
+ AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers])
+fi
Everywhere else does:
AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")
3. Do we require a particular version of raptor?
4. Do we need to check S.Enabled here and/or the Volumes table at all?
Also, isn't it more logical to use "M" instead of "F" for the metadata
table alias for:
+GetEmbeddedMetadataBackup SELECT S.Path || '/' || S.Name,
T.TypeName, F.MetadataID, F.MetadataDisplay From Services S,
ServiceMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND
(S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From
MetadataTypes WHERE Embedded = 0)) UNION SELECT S.Path || '/' || S.Name,
T.TypeName, F.MetadataID, upper(F.MetadataValue) From Services S,
ServiceNumericMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND
(S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From
MetadataTypes WHERE Embedded = 0)) UNION SELECT S.Path || '/' || S.Name,
T.Typename, F.MetadataID, F.MetadataValue From Services S,
ServiceKeywordMetadata F, ServiceTypes T WHERE (S.ID == F.ServiceID) AND
(S.ServiceTypeID == T.TypeID) AND (F.MetadataID IN (SELECT ID From
MetadataTypes WHERE Embedded = 0));
5. Is the notation "(out)" the same notation used in gtk-doc now?
+ * @mount_mount: (out): if @path is on a removable device, the mount
point will
+ * be filled in here
6. This needs fixing in tracker-hal.c, also it doesn't say if it returns
TRUE or FALSE, usually that's a good idea:
+ * Returns Whether or not @path is on a known removable device
+ *
+ * Returns: Whether or not @path is on a known removable devic
6. This doesn't need to be cast:
+ udi = (const gchar*) key;
7. Indentation here is wrong and we could use g_strcmp0() instead so we
don't have to check mp for NULL too:
+ if (mp && strcmp (mp, path) != 0) {
+ if (g_strrstr (path, mp)) {
8. This looks like a memory leak, volume is not freed:
if (available)
*available = libhal_volume_is_mounted (volume);
break;
9. Please don't add blank lines. You can omit these from diff files too
by changing your ~/.subversion/config and adding:
diff-cmd = /home/pvanhoof/bin/svndiff
and with the svndiff content being:
/usr/bin/diff -upBb --label "${3}" ${6} --label "${5}" ${7}
to not diff whitespace changes like this:
+ g_hash_table_replace (metadata->table,
g_object_ref (field),
g_strdup (value));
+
}
10. The changes to tracker_data_metadata_insert() actually look like
they fix a potential leak in TRUNK, i.e. inserting the same thing twice.
Nice catch.
11. This looks like it would cause a crasher now that we don't own all
data sent to libtracker-data/tracker-data-metadata.c, I don't think we
should be freeing the list there or its contents:
- g_return_if_fail (TRACKER_IS_FIELD (field));
+ if (!field) {
+ g_warning ("Field name '%s' has isn't described in the ontology",
field_name);
+ g_list_foreach (list, (GFunc) g_free, NULL);
+ g_list_free (list);
+ return;
+ }
+
12. Same as #11 for tracker_data_metadata_append_to_list() where we free
"value".
13. Just wondering about the name here, shouldn't it be
GetMetadataBackup? I mean, the Turtle file is just a metadata backup
right, I don't quite know how the "Embedded" part comes into it.
+ result_set = tracker_data_manager_exec_proc (iface,
+ "GetEmbeddedMetadataBackup",
+ NULL);
14. Please put _ALL_ structures and typedefs at the top of the file:
+/* TODO: URI branch path -> uri */
+
+typedef struct {
+ TrackerService *service;
+ guint32 iid_value;
+ TrackerLanguage *language;
+ TrackerConfig *config;
+} ForeachInMetadataInfo;
+
15. Please put all static variables at the top of the file:
+ static TrackerConfig *config = NULL;
+ static TrackerLanguage *language = NULL;
16. When linning up variables don't add extra spaces:
+void
+tracker_data_delete_service (const gchar *path,
+ const gchar *rdf_type)
+{
+ TrackerService *service;
+ const gchar *service_type;
+ guint32 service_id;
17. I wonder if we should recursively delete here even if there is no
top level service ID found - in case there is ever something below it
left in the database. Also remember, we changed this at the code camp in
Helsinki, I am still going through the patch to see if this is a
conflict at all:
+ if (service_id != 0) {
+ tracker_data_update_delete_service (service, service_id);
+ tracker_data_update_delete_service_recursively (service, (gchar *) path);
+ tracker_data_update_delete_all_metadata (service, service_id);
+ }
18. Can we format these arguments according to the coding style please:
+set_metadata (TrackerField *field, gpointer value,
ForeachInMetadataInfo *info)
19. Every time I look at this, I think it is crack, that's why I added
the comment. I know we do this in the indexer too, but I wonder if we
should change this to something sensible with a reason too:
+ /* Throttle indexer, value 9 is from older code, why 9? */
+ throttle = tracker_config_get_throttle (info->config);
+ if (throttle > 9) {
+ tracker_throttle (info->config, throttle * 100);
+ }
20. I would have used for() instead of while() for clarity here:
+ GList *list;
+
+ list = value;
+
+ while (list) {
+ set_metadata (field, list->data, user_data);
+ list = list->next;
+ }
i.e.
for (list = value; list; list = list->next)
set_metadata (field, list->data, user_data);
21. I think the API should be a bit more uniform here, i.e.
tracker_data_turtle_import_{start|stop}:
+ void tracker_data_start_turtle_import (void);
+ void tracker_data_stop_turtle_import (void);
22. Instead of duplicating the config and language modules, can we not
just call an initialize and termination function to do all this and wrap
all static variables in a GStaticPrivate? To save memory and also if
this is done more than once per invocation of the application it really
makes sense. Right now I think TrackerConfig and TrackerLanguage are
only ever created once per application. I would like to keep it that way
if possible.
+void
+tracker_data_start_turtle_import (void)
+{
+ if (!config)
+ config = tracker_config_new ();
+ if (!language)
+ language = tracker_language_new (config);
+}
23. All public functions we need variable checking, e.g.
tracker_data_delete_service()
tracker_data_replace_service()
need (for example):
g_return_if_fail (path != NULL);
24. We should use g_debug here not g_print, and the preprocessor
comments should be on column 0 at the start of the line.
+ #ifdef TURTLE_DEBUG
+ g_print ("Q ERROR: %s\n", error->message);
+ #endif /* TURTLE_DEBUG */
+ g_error_free (error);
25. Indentation:
+ GValue id_value = { 0, };
+ GValue is_value = { 0, };
+ gint iid_value, iis_value;
+ guint32 id;
26. API wise, what is the difference between
tracker_data_delete_service()
and
tracker_data_update_delete_service()
both look like they should do the same thing except one is the uniform
API which Jürg worked on. That should be clearer I think. The same
namespace should be used for the _replace_ variant too.
27. We need spaces in here:
+ info->config = config?g_object_ref (config):NULL;
+ info->language = language?g_object_ref (language):NULL;
28. Bloody good catch guys :)
- if (file_iface)
+ if (file_iface) {
g_object_unref (file_iface);
- if (email_iface)
+ file_iface = NULL;
+ }
+
+ if (email_iface) {
g_object_unref (email_iface);
- if (xesam_iface)
+ email_iface = NULL;
+ }
+
+ if (xesam_iface) {
g_object_unref (xesam_iface);
+ xesam_iface = NULL;
+ }
29. More spaces we don't need :)
+void
+tracker_indexer_commit_transaction (TrackerIndexer *indexer)
+{
+ stop_transaction (indexer);
+ tracker_indexer_set_running (indexer, TRUE);
+
+}
+
+void
+tracker_indexer_open_transaction (TrackerIndexer *indexer)
+{
+ tracker_indexer_set_running (indexer, FALSE);
+ start_transaction (indexer);
+}
+
30. I would change tracker_indexer_commit_transaction() and
tracker_indexer_open_transaction() to be tracker_indexer_transaction_*()
instead.
31. There is no need to set mount_point to NULL after freeing it if it
is unused again.
32. Shouldn't we do something in an "else" for the
tracker_hal_path_is_on_removable_device() for "path", like we do for
"other_path"?
+ if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+ path,
+ &mount_point,
+ NULL)) {
+ if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
+ other_path,
+ NULL,
+ NULL)) {
+ tracker_removable_device_add_move (indexer,
+ mount_point,
+ path,
+ other_path,
+ tracker_service_get_name (service));
+ } else {
+ tracker_removable_device_add_removal (indexer,
+ mount_point,
+ path,
+ tracker_service_get_name (service));
+ }
+ }
33. APIs like this really confuse me:
tracker_removable_device_add_move()
tracker_removable_device_add_removal()
Can we come up with something a bit more intuitive?
34. We don't need to allocate memory here:
+ values[0] = g_strdup (triple->object);
+ values[1] = NULL;
...
+
+ g_free (values[0]);
35. All request IDs are generated before anything else. This is for
debugging purposes and so we know how many requests we have _REALLY_
received regardless of if there was some parameter check failure. This
needs to be before any tracker_dbus_async_return_if_fail() calls:
+ request_id = tracker_dbus_get_next_request_id ();
36. This normally comes up as a warning depending on compiler flags,
"(void)" is needed:
+static gchar *
+get_turtle_userdata_backup_filename ()
37. Checking private is NULL or not is pointless here because it is
created in main() and never freed - I just noticed :) so I should fix
that. Also, we should store this value in private, since we often get
"finished" more than once from the indexer and recreating the string
here would be unnecessary:
+ if (private) {
+ return g_build_filename (private->user_data_dir,
+ "tracker-userdata-backup.ttl",
+ NULL);
+ } else {
+ g_critical ("Directories not initialized");
+ return NULL;
+ }
38. This should be fixed now (sending finished twice at the same time).
But also you should know that "finished" is sent whenever the indexer
finishes indexing ANY content from ANY file changes on the disk. So this
solution is broken anyway. It will need fixing:
+ if (counter >= 2) {
39. I would rather we are a bit more descriptive for variable names,
which callback_id this is, isn't obvious without searching further in
the code, this is in main() for trackerd:
+ gulong callback_id;
40. More spacing issues:
if (force_reindex) {
+
+ gchar *turtle_file;
+
+ turtle_file = get_turtle_userdata_backup_filename ();
41. Eeeek, what is this for? You are duplicating a lot of stuff here!
The tracker_db_manager_init() is called just after this. The
tracker_db_index_manager_init() is also called after this. The
tracker_data_manager_init() is also called after this. As I understand
it you just want to save the turtle file after an index right -
specifically the initial index? If so, then why not just check the
"initial index" state using tracker_status_get_is_first_time_index().
You could also use this in the "finished" callback.
This code looks to me like it would break a lot of things and it
shouldn't get committed:
if (force_reindex) {
+
+ gchar *turtle_file;
+
+ turtle_file = get_turtle_userdata_backup_filename ();
+
+ g_message ("Saving metadata in %s", turtle_file);
+
+ /* Init the DB stack */
+ tracker_db_manager_init (0, &is_first_time_index, TRUE);
+
+ tracker_db_index_manager_init (0,
+ tracker_config_get_min_bucket_count (config),
+ tracker_config_get_max_bucket_count (config));
+
+ file_index = tracker_db_index_manager_get_index (TRACKER_DB_INDEX_FILE);
+ email_index = tracker_db_index_manager_get_index
(TRACKER_DB_INDEX_EMAIL);
+
+ tracker_data_manager_init (config, language, file_index, email_index);
+
+ tracker_backup_save (turtle_file);
+
+ /* Shutdown the DB stack */
+ tracker_data_manager_shutdown ();
+
+ tracker_db_index_manager_shutdown ();
+ tracker_db_manager_shutdown ();
+
+ g_free (turtle_file);
+
42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that
is rarely the case. Although I admit I am not entirely sure what
org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from
the API name. Does it write the Turtle file to the DB? If so, then why
would you want to write metadata to the database AFTER it has just been
written to with all the latest information from the indexer? If your
intention (on the other hand) is to write initial data to the database
on a reindex before the indexer has had a chance to index the real data
it gets, then I think this is going about it the wrong way.
+ if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) {
+ g_debug ("Setting callback for crawling finish detection");
+ callback_id = g_signal_connect (private->processor, "finished",
+ G_CALLBACK (crawling_finished_cb),
+ &callback_id);
+ }
+
43. About tracker-turtle.h:
a) Why is MetadataItem public, it looks unused in the header.
b) Why doesn't it use a proper namespace?
i.e. TrackerTurtleMetadataItem
c) Why don't the callback typedefs have Tracker in the namespace?
i.e. TrackerTurtleTripleCallback
d) What happened to the coding style here?
e) Include order is incorrect.
f) Includes should have config.h
g) G_BEGIN_DECLS and G_END_DECLS are needed.
h) Why not use Glib types here? gint, gchar, etc.
i) When I saw "stmt" in the code, I had to find out what it was, where
it came from, etc. Namespacing issues there again need sorting out.
j) I wouldn't include Jamie's name in the copyright since he didn't
write it.
44. About tracker-turtle.c:
a) The "tracker-turtle.h" include should come after all external includes.
b) There is no need to include the same files already included in the
tracker-turtle.h.
c) In tracker_turtle_open(), you just print a critical if not
initialised. I would suggest you use g_return_val_if_fail () instead since.
d) I would use g_fopen() not fopen() for compatibility across platforms.
e) Is turtle->uri a uri or a path? Variable names should reflect this
accurately:
i.e. turtle->uri = raptor_new_uri ("/");
f) Can we use proper variable names like "count" instead of 3 letter
variable names that look awefully like another English word I dare not
utter :)
i.e. guint cnt;
g) There are some serious code style issues there.
h) I wouldn't include Jamie's name in the copyright since he didn't
write it.
45. About tracker-removable-device.h:
a) Include order is incorrect.
b) G_BEGIN_DECLS and G_END_DECLS are needed.
c) I wouldn't include Jamie's name in the copyright since he didn't
write it.
d) tracker_removable_device_add_move() shouldn't this be
tracker_removable_device_move()?
e) tracker_removable_device_add_removal() shouldn't this be
tracker_removable_device_removal()?
46. About tracker-removable-device.c:
a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect)
should go right to the end of the file since it is so high up, but it
doesn't. I think we should decide if we are making the whole file
require that define or just parts of it and fix it accordingly.
b) Do you really need all those includes or is that a copy and paste
from tracker-indexer.c :)
c) Again all typedefs need proper namespacing including the Tracker name
otherwise it is confusing as hell when you see it somewhere.
d) Coding style needs fixing.
Summary:
It looks to me like a lot of work is still needed to keep the code clean
and consistent with the current code base. There are a couple of leaks
which need fixing and a couple of issues which this patch fixes too. The
major concern I have is about when the restore and backup are actually
done. Unless I have misunderstood (which is entirely possible) it seems
they are done at the wrong times, or at least could be done at better times.
This patch is a good start however. Once you have managed to fix the
issues, send me the new patch and I will do a quick review.
I would like to see Carlos also review the parts where the
tracker-indexer.c is involved. They look fine to me, but he might have
an opinion to share.
Thanks for the hard work guys!
--
Regards,
Martyn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]