Re: [Tracker] Database access abstraction



Jamie McCracken wrote:
On Mon, 2008-10-20 at 20:07 +0200, JÃrg Billeter wrote:
Hi all,

hi juerg,

Hi :)

as a preparation to support decomposed database tables in Tracker, I've
been looking into how we could abstract database access so that the
database schema can change without affecting the code in trackerd and
tracker-indexer.

its also likely to affect the stored procs quite heavily and more so
than the code.

Not sure how you think it will affect stored procedures here?

From what I understand Juerg is talking about taking out all the SQL
construction and putting it in a common place. The SQL itself is
unlikely to change here (unless I misunderstood).

I dont think a full abstraction is necessary as saving/updating data is
only done in a few places and this code should be shared more

I think it is totally necessary. It has been on my TODO list for a long
time. Doing all that SQL construction in a common place is a superb idea
because then we know exactly where it is all done. Right now we have two
files (one in trackerd and one in the indexer) sure, but I like the idea
of calling an API from the indexer and/or the daemon which gets the
information and I know that if I want to change the way it works it is
all done completely outside of the trackerd and indexer implementations.
It also means we minimise and duplication of code/bugs/etc.

I'm proposing to introduce an additional (private) library that acts as
a high-level database interface, so it sits between libtracker-db and
trackerd/tracker-indexer. That library, let's call it libtracker-data
for now, is the only place where SQL queries get constructed.


well I have been asking for a libtracker-metadata to host shared
metadata support between trackerd and tracker-indexer so to some degree
you will have my approval :)

Yea, it is all on the list - just need to do it. :)

Also in the future i want to support direct access to sqlite via  a
client lib so we can bypass dbus (and trackerd) for select queries where
speed is paramount and volume of data is too big for dbus to handle
optimally (think get all my 100,000 music tracks with metadata). So this
library would have to handle all querying and any future ones (like
sparql) - so you will have no problem from me for implementing that
support in a lib

Hmm, I would like to see the difference it makes using DBus and if it
really is an issue. We have an API like this in DBus now which Phillip
added - I really don't like the idea of people executing random SQL on
the databases. It can lead to much bigger problems. Phillip stresses
this in the .xml file where we document this API. I think quite rightly
so too.

Im not sure why you want to abstract *all* db access? i would have
thought indexer specific requests can quite nicely remain where they are
unless you have a good reason?

Why not?

It becomes then architecturally where all database abstraction is
(libtracker-db) and where all database SQL construction or SQL procedure
calls (libtracker-data) are kept.

It also means there is less duplication, less bugs, it is much more
maintainable, etc, etc. As far as I can see, there are no disadvantages
here, only advantages.

my preference is for sharing more routines rather than abstracting them
As a first step, we should probably just move relevant functions or
function parts from trackerd and tracker-indexer to the new library and
refactor and extend the library later. Looking at the current code, the
API of libtracker-data would be composed of the following parts:

 * Ontology/Schema API
   These functions don't query the actual data or metadata but only the
   ontology/schema and its mapping to the database layout. They are
   currently part of trackerd/tracker-db.c
     tracker_db_metadata_get_related_names
     tracker_db_metadata_get_table
     tracker_db_get_field_name
     tracker_db_get_metadata_field
     tracker_db_create_array_of_services
     tracker_db_xesam_get_metadata_names
     tracker_db_xesam_get_all_text_metadata_names
     tracker_db_xesam_get_service_names

 * Service/Metadata Query API
   These functions query information about a specific service/resource,
   for example, path to id mapping and metadata retrieval. They are
   currently part of trackerd/tracker-db.c and
   tracker-indexer/tracker-indexer-db.c
     tracker_db_metadata_get
     tracker_db_metadata_get_all
     tracker_db_metadata_get_array
     tracker_db_metadata_get_delimited
     tracker_db_get_all_metadata
     tracker_db_get_parsed_metadata
     tracker_db_get_unparsed_metadata
     tracker_db_get_property_values
     tracker_db_check_service
     tracker_db_get_service_type
     tracker_db_service_get_by_entity
     tracker_db_file_get_id
     tracker_db_file_get_id_as_string

 * Search and General Query API
   These functions perform arbitrary queries on the whole database. They
   are currently part of tracker-db.c, tracker-metadata.c,
   tracker-search.c, tracker-keywords.c, and tracker-files.c in trackerd
     tracker_db_search_text
     tracker_db_search_text_and_mime
     tracker_db_search_text_and_location
     tracker_db_search_text_and_mime_and_location
     tracker_db_live_search_start
     tracker_db_live_search_stop
     tracker_db_live_search_get_all_ids
     tracker_db_live_search_get_new_ids
     tracker_db_live_search_get_deleted_ids
     tracker_db_live_search_get_hit_data
     tracker_db_live_search_get_hit_count
     tracker_db_keywords_get_list
     tracker_db_files_get
     tracker_db_files_get_by_service
     tracker_db_files_get_by_mime
     tracker_db_create_event
     tracker_db_xesam_delete_handled_events
     tracker_data_get_unique_values
     tracker_data_get_sum
     tracker_data_get_count
     tracker_data_get_unique_values_with_count
     tracker_data_get_unique_values_with_count_and_sum
     tracker_data_get_metadata_for_files_in_folder
     tracker_data_keywords_search
     tracker_data_search_query
    The tracker_data_* signify the database access parts of the D-Bus
    service methods. The actual D-Bus method implementations stay at
    their place, of course.

  * Update API
    These functions are used to modify data and metadata, they are only
    executed by the indexer and currently reside in
    tracker-indexer/tracker-indexer-db.c
     tracker_db_get_new_service_id
     tracker_db_create_service
     tracker_db_delete_service
     tracker_db_delete_service_recursively
     tracker_db_move_service
     tracker_db_increment_stats
     tracker_db_decrement_stats
     tracker_db_set_metadata
     tracker_db_delete_all_metadata
     tracker_db_delete_metadata
     tracker_db_set_text
     tracker_db_get_text
     tracker_db_delete_text

We also need to move trackerd/tracker-query-tree.c,
trackerd/tracker-rdf-query.c, trackerd/tracker-xesam-query, and
tracker-indexer/tracker-metadata.c to the new library as they generate
SQL queries or are used by other functions in libtracker-data.

Yea, right now tracker-query-tree.c is mostly for QDBM I think and the
_get_hit_count() API. Not sure how this will change with the new SQLite FTS.

I also think the tracker-xesam-query.c and tracker-rdf-query.c are very
similar (if that's teh XESAM file quick does RDF construction). I can't
remember if we absolved the duplication here with tracker-rdf-query.c or
not. That's another TODO item I think :)

Any comments or suggestions about this, does the grouping seem sensible?
Please note that I don't know Tracker's code base very well yet, so I
might be missing or misunderstanding some things.

Looks good to me. You might want to break down the searching a little
more into live and non-live - but I guess you will see if that is needed
as you get on.

Good analysis Juerg!

-- 
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]