Re: [Tracker] Database access abstraction



On Mon, 2008-11-10 at 15:20 +0000, Martyn Russell wrote:
Jamie McCracken wrote:
On Mon, 2008-11-03 at 15:30 +0100, JÃrg Billeter wrote:

What is your opinion on my proposal to introduce a TrackerService (or
TrackerResource) class - similar to the GFile interface in GIO - and use
that throughout the API where we currently use path, uri, or service id?


the big disadvantage of this is that we end up supporting and coding in
more paths through the code and more SPs - not sure if its worth it
especially as in a lot of cases we will be supporting both uri and ID
params where we only ever use one

True.

We could actually just:

  typedef GFile TrackerResource

For now just to keep it something internal which can change. That way
APIs stay the same and we do operations on the resource (or GFile if we
explicitly know it is a GFile).

I don't quite understand how this would help. Either we say it's always
a valid GFile, then we could directly use GFile in the API, however,
this might be problematic for e-mails and possibly other resource types.
Or we say it doesn't have to be a valid GFile, and then we won't be able
to use any g_file_* functionality, which means that we need our own
interface/class TrackerResource. Am I missing something?

The indexing calls will all be ID based except for the call to check if
a file is up to date which is obviously URI driven and to get the ID for
a URI

I just had some crazy idea for a second there about perhaps having a
unique ID (like an MD5SUM) for each path which is the service ID in the
database (instead of auto incrementing it) - so you can always get the
ID easily without needing extra DB look ups. Am I on crack or is this
work investigating? Ignore me if it is :)

I don't think that this would improve the situation a lot compared to
just using URIs. As far as I can tell, we would have to store the hashes
as strings as reduction to 64 bit integer is probably not sensible due
to possible collisions (and also with 128 bit MD5 we'd have to think of
a way to handle collisions). And if we store IDs as strings, we lose the
performance benefit.

However, I'm not convinced that caching IDs in the indexer results in a
big performance difference as I'd assume that updating/inserting rows
takes a lot more time than querying the ID in a subselect. I haven't
performed any measurements, though, so I could be completely wrong. Has
this already been benchmarked earlier or is it possible that the ID
caching is not really necessary?

JÃrg




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]