Re: [Tracker] miner-fs: Placing monitors on directories takes way too much time



Hi there,

With this patch in mind we started working on an API that allows
multiple inserts per DBus call (and yet get multiple errors back, so the
queries don't all belong in the same transaction set).

This experimental work is taking place in the branch multi-insert.

Today GNOME's git is down, so I have pushed the latest work on this
branch to gitorious.

The branch on gitorious:
http://meego.gitorious.org/tracker/tracker/commits/multi-insert

Note that later this week or today, when GNOME's git comes back for SSH
access, I'll push that branch to GNOME's git too. And we might of course
also merge it all to master.

Here are some test results:


pvanhoof lors:~/repos/gnome/tracker/master$ git push gitorious multi-insert 
Counting objects: 2459, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (1741/1741), done.
Writing objects: 100% (2211/2211), 378.62 KiB, done.
Total 2211 (delta 1776), reused 595 (delta 466)
=> Syncing Gitorious... [OK]
To git gitorious org:tracker/tracker.git
 * [new branch]      multi-insert -> multi-insert
pvanhoof lors:~/repos/gnome/tracker/master$ cd tests/functional-tests/
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.103675, Update: 0.139094
Reversing run (first array then update)
Array: 0.290607, Update: 0.161749
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.105920, Update: 0.137554
Reversing run (first array then update)
Array: 0.118785, Update: 0.130630
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.108501, Update: 0.136524
Reversing run (first array then update)
Array: 0.117308, Update: 0.151192
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.104705, Update: 0.138569
Reversing run (first array then update)
Array: 0.108777, Update: 0.134969
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.105046, Update: 0.155692
Reversing run (first array then update)
Array: 0.114671, Update: 0.132269
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.107001, Update: 0.139992
Reversing run (first array then update)
Array: 0.118955, Update: 0.133169
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ ./update-array-performance-test 
First run (first update then array)
Array: 0.106915, Update: 0.140673
Reversing run (first array then update)
Array: 0.192792, Update: 0.136646
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$ cat update-array-performance-test.c 
/*
 * Copyright (C) 2010, Codeminded BVBA <abustany gnome org>
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public
 * License as published by the Free Software Foundation; either
 * version 2 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 *
 * You should have received a copy of the GNU General Public
 * License along with this library; if not, write to the
 * Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
 * Boston, MA  02110-1301, USA.
 *
 * Copied from ../tracker-steroids/tracker-test.c
 */

#include <stdlib.h>
#include <string.h>

#include <tracker-bus.h>
#include <tracker-sparql.h>

typedef struct {
        GMainLoop *main_loop;
        const gchar *query;
        guint len, cur;
} AsyncData;

static TrackerSparqlConnection *connection;
#define MSIZE 90
#define TEST_STR "Brrr0092323"

static const gchar *queries[90] = {
            "INSERT { _:a0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:a9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:a11 a nmo:Message; nie:title '" TEST_STR "' }", 
            "INSERT { _:b0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:b9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:b11 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:c0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c12 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d12 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e11 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f0 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f9 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f11 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:b1 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:b8 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:b13 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:c1 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c8 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c13 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d1 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d8 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d14 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e1 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e8 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e14 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f1 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f8 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f15 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:b2 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:b7 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:b15 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:c2 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c7 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c15 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d2 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d7 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d16 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e2 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e7 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e16 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f2 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f7 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f17 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:b3 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:b6 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:b16 a nmo:Message; nie:title '" TEST_STR "'}",
            "INSERT { _:c3 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c6 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c18 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d3 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d6 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d19 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e3 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e6 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e20 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f3 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f6 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f21 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:b4 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:b5 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:b22 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:c4 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c5 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c23 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d4 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d5 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d24 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e4 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e5 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e24 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f4 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f5 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f25 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:c5 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:c2 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:c26 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:d5 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:d2 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:d28 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:e5 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:e2 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:e29 a nmo:Message; nie:title '" TEST_STR "' }",
            "INSERT { _:f5 a nmo:Message; nie:title '" TEST_STR "' }", "INSERT { _:f2 a nmo:Message; 
nie:title '" TEST_STR "' }", "INSERT { _:f33 a nmo:Message; nie:title '" TEST_STR "' }"};

static void
async_update_array_callback (GObject      *source_object,
                             GAsyncResult *result,
                             gpointer      user_data)
{
        AsyncData *data = user_data;
        GPtrArray *errors;

        errors = tracker_sparql_connection_update_array_finish (connection, result);
        g_ptr_array_unref (errors);
        g_main_loop_quit (data->main_loop);
}


static void
test_tracker_sparql_update_array_async ()
{
        GMainLoop *main_loop;
        AsyncData *data;

        main_loop = g_main_loop_new (NULL, FALSE);

        data = g_slice_new (AsyncData);
        data->main_loop = main_loop;

        /* Cast here is because vala doesn't make const-char-** possible :( */
        tracker_sparql_connection_update_array_async (connection,
                                                      (char**) queries, MSIZE,
                                                      0, NULL,
                                                      async_update_array_callback,
                                                      data);

        g_main_loop_run (main_loop);

        g_slice_free (AsyncData, data);
        g_main_loop_unref (main_loop);

}

static void
async_update_callback (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
        AsyncData *data = user_data;
        GError *error = NULL;

        data->cur++;

        tracker_sparql_connection_update_finish (connection, result, &error);
        if (error)
                g_error_free (error);

        if (data->cur == data->len)
                g_main_loop_quit (data->main_loop);
}

static void
test_tracker_sparql_update_async ()
{
        guint i;
        GMainLoop *main_loop;
        AsyncData *data;

        main_loop = g_main_loop_new (NULL, FALSE);

        data = g_slice_new (AsyncData);
        data->len = MSIZE;
        data->main_loop = main_loop;
        data->cur = 0;

        for (i = 0; i < data->len; i++) {
                tracker_sparql_connection_update_async (connection,
                                                        queries[i],
                                                        0, NULL,
                                                        async_update_callback,
                                                        data);
        }

        g_main_loop_run (main_loop);

        g_slice_free (AsyncData, data);
        g_main_loop_unref (main_loop);

}


gint
main (gint argc, gchar **argv)
{
        GTimer *array_t, *update_t;

        g_type_init ();

        /* do not require prior installation */
        g_setenv ("TRACKER_SPARQL_MODULE_PATH", "../../src/libtracker-bus/.libs", TRUE);

        connection = tracker_sparql_connection_get (NULL, NULL);

        g_print ("First run (first update then array)\n");

        tracker_sparql_connection_update (connection,
                                          "DELETE { ?r a rdfs:Resource } WHERE { ?r nie:title '" TEST_STR "' 
}",
                                          0, NULL, NULL);

        update_t = g_timer_new ();
        test_tracker_sparql_update_async ();
        g_timer_stop (update_t);

        tracker_sparql_connection_update (connection,
                                          "DELETE { ?r a rdfs:Resource } WHERE { ?r nie:title '" TEST_STR "' 
}",
                                          0, NULL, NULL);

        array_t = g_timer_new ();
        test_tracker_sparql_update_array_async ();
        g_timer_stop (array_t);

        tracker_sparql_connection_update (connection,
                                          "DELETE { ?r a rdfs:Resource } WHERE { ?r nie:title '" TEST_STR "' 
}",
                                          0, NULL, NULL);

        g_print ("Array: %f, Update: %f\n", g_timer_elapsed (array_t, NULL), g_timer_elapsed (update_t, 
NULL));

        g_print ("Reversing run (first array then update)\n");

        g_timer_destroy (array_t);
        g_timer_destroy (update_t);

        array_t = g_timer_new ();
        test_tracker_sparql_update_array_async ();
        g_timer_stop (array_t);

        tracker_sparql_connection_update (connection,
                                          "DELETE { ?r a rdfs:Resource } WHERE { ?r nie:title '" TEST_STR "' 
}",
                                          0, NULL, NULL);

        update_t = g_timer_new ();
        test_tracker_sparql_update_async ();
        g_timer_stop (update_t);

        tracker_sparql_connection_update (connection,
                                          "DELETE { ?r a rdfs:Resource } WHERE { ?r nie:title '" TEST_STR "' 
}",
                                          0, NULL, NULL);

        g_print ("Array: %f, Update: %f\n", g_timer_elapsed (array_t, NULL), g_timer_elapsed (update_t, 
NULL));

        g_timer_destroy (array_t);
        g_timer_destroy (update_t);
        g_object_unref (connection);

        return 0;
}
pvanhoof lors:~/repos/gnome/tracker/master/tests/functional-tests$


On Thu, 2010-09-09 at 17:04 +0800, Chen, Zhenqiang wrote:
I will try d-bus-1.3 and latest tracker.

Opps, yea, I meant 1.3.1, not 3.7. Also, note, 1.3.0 is buggy so you 
will need 1.3.1. This should avoid quite some memory copies 
when indexing.


I tried tracker git code (master, last update Sep 7). But test results show it is ~15% slower than 
tracker-0.9.16.
And I tried d-bus-1.3.1. It did not help in my case. 

dbus is one of the bottlenecks. Tests show 1000 continuous INSERTs will block the d-bus in my system. 

To reduce the dbus overhead, I tried to group the UPDATE for files of a dir into one update. Tests show it 
is ~2X faster. Here are the logs:
(Notes: among the 10693 files, 10654 files are photos.) 

tracker git master:
Finished mining in seconds:232.713859, total directories:354, total files:10693

tracker-0.9.16
Finished mining in seconds:200.689239, total directories:354, total files:10693

tracker-0.9.16 with group update (testing code segment is attached.)
Finished mining in seconds:92.813985, total directories:354, total files:10693

What do you think about the idea "grouping the updates for files of one dir into one update"? 

Thanks!
-Zhenqiang

_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]