Re: [Tracker] Reviving the project, a first attempt

From: Philip Van Hoof <philip codeminded be>
To: Martyn Russell <martyn lanedo com>
Cc: Jos van den Oever <jos vandenoever info>, Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] Reviving the project, a first attempt
Date: Thu, 22 Nov 2012 12:21:10 +0100

On Thu, 2012-11-22 at 10:52 +0000, Martyn Russell wrote:

On 11/22/2012 09:01 AM, Philip Van Hoof wrote:

Hello Philip,


Hey Martyn!

Buried under its own weight of complexity the project is stifled. Why do
I think this?


The project isn't dead. I should point this out. It's slowed up due to 
the change in funding clearly.


Right, it's not dead it's slowed down. I agree. You made several fine
releases and patches are being reviewed. Great job on that.

[CUT]

I always wondered, JÃrg, where the name vstore came from? But it was a
fantastic branch and piece of work that you did. It clearly steered the
project in the direction of SPARQL and Nepomuk as ontology. Thanks.


I don't recall what the branch was about actually. At the hight of our 
development, I was merging ca. 6 branches a week into master. Hard to 
keep up with all of them ;)


That branch was the SPARQL and Nepomuk stuff, the big redesign point
where the indexer got separated from the SPARQL endpoint. The branch got
removed apparently but I recall that vstore was its name.

[CUT]

This is not the case anymore. And I heard from developers of a new phone
OS being developed that Tracker is again used, that it was again a hard


Which one?


They have not officially announced this and I don't want the developers
who discussed it with me in private to feel sorry that they did. You'll
figure it out, Martyn ;-)

[CUT]

I would also like to thank our top contributors and the people who
worked on Qt based libraries built on top of libtracker-sparql for
spreading the truth about our team and Tracker. You guys know who you
are, I don't have to name you ;-)


Don't forget your input. You made quite a sizeable contribution and made 
quite some difference. ;)

And oh my God I'm writing so much text just to make a simple point ..


It's definitely not an imposter writing this then :P


Thanks for the nice compliments!

The API libtracker-extract's tracker_extract_client_get_metadata is not
public enough because the Tracker is relying too heavy on the file
system miner. Today it is time to change this.


I agree that it's too heavily relying on the miner.

There is a good reason for this. The filesystem information, name, size, 
mtime, etc is all handled by the miner-fs. You could likely solve this 
issue by "chaining" extractors and have a basic file extractor which 
gets this information so the miner isn't doing it.


Yes, I agree.

This is the reason why miner-fs is injecting SPARQL, because it 
concatenates extractor specific SPARQL with file system general SPARQL.


We always wanted to redesign this, as it was an extra IPC callback that
we could avoid. So improvement here would also be beneficial for
Tracker's FS miner in the form on less IPC overhead. It's win-win.

Phone builders want to rid themselves of file system mining. Instead
they want to let MTP daemons, who deal with incoming files, do the
processing and extraction of file meta data. They don't want to
configure with DConf or a GKeyFile to point to a directory where the MTP
daemon will write files, at all.


Right, there are different ways which data can come in and we shouldn't 
restrict ourselves to the filesystem. That makes sense. But we don't. It 
just so happens the miner-fs is the main way people get data into 
tracker-store.


Right now. We can change this, but I think the right developers need to
be reactivated or at least provide a supportive role to a contributor
when or if somebody new starts working on this.

Instead they want their MTP daemon to use a simple API that will trigger
tracker-extract into extracting the file and then writing the SPARQL
INSERT to the SPARQL endpoint.

One of the things I would love to see happen (or add) is a command line 
option or way to inject SPARQL from tracker-extract into the store. We 
have a hack for this right now with tracker-control -f $FILE and a dbus 
API. The main problem with this is that the filesystem data is not there 
for files (which are the main use case in tracker right now).


Yes, this could certainly be part of such a redesign or even an
intermediate step towards it.

[CUT]

Tomorrow's phone builders might not even use a file system. Why would
inter app data sharing then necessarily depend on file system indexing?!


You know that the miner-fs doesn't have to be a daemon and can index on 
demand (instead of by inotify) right now in stable releases right? The 
miner-fs is also configurable to not be built --disable-miner-fs (I think).


I know, but I don't think this is sufficient. A MTP daemon doesn't want
to call system(). They'll need deeper and better defined integration.

File system indexing is of course important, but only for users who need
it. Like a desktop. A desktop needs it. A phone might not need it. And
if it does, they understandably want to limit its use.

I would like to propose to start with adapting libtracker-extract to be
fully documented, to change tracker_extract_client_get_metadata's API in
such a way that it is truly obvious for a platform builder, integrator
or app developer of for example a MTP daemon to call it in order to get
the file's meta data to be inserted into tracker-store before the MTP
daemon had to write the file itself.


I was under the impression that it was already. If someone is paying for 
this or wants patch review, I am happy to step up.


Awesome

To make it possible to call this on a .tmp-XYZ file for a file that will
later be renamed to Girlfriend.JPEG in the DCIM folder of the phone.

Well, this isn't actually easy to solve even if you move away from 
miner-fs. If you're returning the full SPARQL including things like the 
file name, size, mtime, etc. then these details change. You either 
change the SPARQL and wait before injecting it to the store, or post 
process by updating the store details when it changes.


As a team we did a lot of things that were not easy to solve ;-)

You can't have it both ways. You either want the data early and have to 
cope with changes like the name changing OR you wait and have the data 
in it's final (albeit maybe for a small time) state.

Yep

Right now this ain't possible, because libtracker-extract is too focused
on being "just a tool library for the filesystem miner".


Well, I would say it's more that the miner-fs is _THE_ only one using 
it, so it's not so bad given that.


Agree

If you mean to suggest we separate this into a new project, I think that 
might be a good idea. Same for the miner-fs. Possible for 
libtracker-sparql too? Some investigation would be needed, there are 
core libraries that we depend on in all cases and might cause problems...


I don't think that separating or splitting the subprojects of Tracker is
right now needed and/or a good idea. Long term it probably is.

One of the recent issues I've had with Tracker is, I can't find it on 
Google - I think Rob mentioned this way back at some GUADEC. The name is 
quite generic. I have been asked several times why we have so many 
things in the tree and if we can disable or split out things. I think 
RedHat recently asked if we could do this, I am sure Debian maintainers 
have too.


Yes. Back then my opinion was that a rename was not needed and would at
that point in time hurt the project's team adhesion.

Today the situation is different and if all former team members and / or
a new group of contributors taking a lead role in the project agree,
then I think a rename (long term goal) would not be a bad idea.

Sadly has the name "Tracker" been given a bad reputation for false
reasons. I think the Intel MeeGo attempt for example wrongly accused
Tracker of being a reason why Harmattan MeeGo didn't succeed.

Start of thread here:
http://lists.meego.com/pipermail/meego-architecture/2011-March/000081.html

This was my response:
http://lists.meego.com/pipermail/meego-architecture/2011-March/000113.html
https://mail.gnome.org/archives/tracker-list/2011-March/msg00033.html

A rename might undo that. I'm still not much of a fan for yielding to
reputation pressure done by clueless people who without doing much
investigation (like we did do) make faux statements.

It's not really the Linux way IMO to have everything in one monolithic 
module. So I wouldn't mind splitting things out.


I agree.

To make language bindings for it like for JS, Dalvik, MonoTouch, Qt.


That would be good. The API is quite small too, shouldn't take much effort.


Right

It ought to be a library for all application developers, just like how
libtracker-sparql is such a library: obvious in API, well documented,
suitable for wrapping it with for example a Qt layer and all that stuff.

:) interesting. There is a reason why it's not a library. We often have 
crashes for whatever reason.


Yes, I don't think tracker-extract should cease being a process. A
library that does IPC to tracker-extract is probably the right solution.
That or a strong warning that a no-extract-process libtracker-extract
can crash as it relies on a wide variety of libraries having to cope
with a wide variety of file formats.

A libtracker-extract could also be done like how libstreamanalyze was
done, but I consider libstreamanalyzer's integration, adaptation and /
or merge with what is now the Tracker project a long term goal.

I'm adding Jos in CC. hey Jos, start of thread here:
https://mail.gnome.org/archives/tracker-list/2012-November/msg00009.html

Sometimes, it's just that the system library was updated and now our
extractor crashes. Sometimes, it's problematic files which cause crashes.
That's why we use a daemon/program to do extraction, because the people
using the extractor don't die. I think making this into a library
presents some interesting  situations we would need to consider like that.


I fully agree.

I think whoever starts with improving libtracker-extract in this
direction, perhaps by renaming, copying or refactoring to a new library
the API tracker_extract_client_get_metadata, will revive the project to
its original glory.


I don't really view the project as "loosing" it's glory. It's just 
slowed down, matured even you could say.


Yes ok. Still, it had more glory a few years ago. I think :-)

Kind regards,

Philip

-- 


Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be

Follow-Ups:
- Re: [Tracker] Reviving the project, a first attempt
  - From: Tshepang Lekhonkhobe

References:
- [Tracker] Reviving the project, a first attempt
  - From: Philip Van Hoof
- Re: [Tracker] Reviving the project, a first attempt
  - From: Martyn Russell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]