Re: [Tracker] Reviving the project, a first attempt

From: Martyn Russell <martyn lanedo com>
To: Philip Van Hoof <philip codeminded be>
Cc: Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] Reviving the project, a first attempt
Date: Thu, 22 Nov 2012 10:52:49 +0000

On 11/22/2012 09:01 AM, Philip Van Hoof wrote:

Guys,


Hello Philip,

Buried under its own weight of complexity the project is stifled. Why do
I think this?

The project isn't dead. I should point this out. It's slowed up due tothe change in funding clearly.

What we tried to achieve during the Maemo Fremantle and MeeGo Harmattan
years at Nokia was so complicated to sell and complicated to develop
that most of the lead developers who got involved in the development of
the project from vstore to master became what I call "expensive people"

* The vstore branch, find this in the mailing list:
                               From:
JÃrg Billeter <j bitron ch>
                            Subject:
Re: [Tracker] vstore to master
                               Date:
Thu, 16 Apr 2009 18:15:27 +0200

I always wondered, JÃrg, where the name vstore came from? But it was a
fantastic branch and piece of work that you did. It clearly steered the
project in the direction of SPARQL and Nepomuk as ontology. Thanks.

I don't recall what the branch was about actually. At the hight of ourdevelopment, I was merging ca. 6 branches a week into master. Hard tokeep up with all of them ;)

The SPARQL and Miner components both became complex and at the same time
intertwined. I regret them being intertwined and I always wanted the
SPARQL endpoint tracker-store to be a different project than the Miner
projects and the Extractor project:

        Although I don't like the overkill design of Nepomuk-KDE, I like
        how Jos van den Oever kept libstreamanalyzer and strigi and
        separate from the Nepomuk-KDE world.

Opinions in the team admittedly differed on that. I never raised my
voice on this issue because I thought that the team had to stay together
to suffer, as a team, the burden of delivering what we delivered for
Harmattan MeeGo: The adhesion between Lanedo, Codethink, Codeminded and
for the Qt libraries like qtcontacts-tracker OpenIsmus was too important
for delivering, that splitting up the project wasn't a worthwhile risk.

This is not the case anymore. And I heard from developers of a new phone
OS being developed that Tracker is again used, that it was again a hard


Which one?

sell internally, but I didn't expect anything less as our project was an
extremely hard sell within the Harmattan team at Nokia too, and that
they aren't dissatisfied with it. Even satisfied. Wow!

:)

I would also like to thank our top contributors and the people who
worked on Qt based libraries built on top of libtracker-sparql for
spreading the truth about our team and Tracker. You guys know who you
are, I don't have to name you ;-)

Don't forget your input. You made quite a sizeable contribution and madequite some difference. ;)

And oh my God I'm writing so much text just to make a simple point ..


It's definitely not an imposter writing this then :P

The API libtracker-extract's tracker_extract_client_get_metadata is not
public enough because the Tracker is relying too heavy on the file
system miner. Today it is time to change this.


I agree that it's too heavily relying on the miner.

There is a good reason for this. The filesystem information, name, size,mtime, etc is all handled by the miner-fs. You could likely solve thisissue by "chaining" extractors and have a basic file extractor whichgets this information so the miner isn't doing it.

This is the reason why miner-fs is injecting SPARQL, because itconcatenates extractor specific SPARQL with file system general SPARQL.

Phone builders want to rid themselves of file system mining. Instead
they want to let MTP daemons, who deal with incoming files, do the
processing and extraction of file meta data. They don't want to
configure with DConf or a GKeyFile to point to a directory where the MTP
daemon will write files, at all.

Right, there are different ways which data can come in and we shouldn'trestrict ourselves to the filesystem. That makes sense. But we don't. Itjust so happens the miner-fs is the main way people get data intotracker-store.

Instead they want their MTP daemon to use a simple API that will trigger
tracker-extract into extracting the file and then writing the SPARQL
INSERT to the SPARQL endpoint.

One of the things I would love to see happen (or add) is a command lineoption or way to inject SPARQL from tracker-extract into the store. Wehave a hack for this right now with tracker-control -f $FILE and a dbusAPI. The main problem with this is that the filesystem data is not therefor files (which are the main use case in tracker right now).

I think we mis designed this because we thought too much that the entire
world of Nepomuk, inter app data sharing and meta data in general
necessarily depends on file system indexing. This is just not true.


Things move on.

Tomorrow's phone builders might not even use a file system. Why would
inter app data sharing then necessarily depend on file system indexing?!

You know that the miner-fs doesn't have to be a daemon and can index ondemand (instead of by inotify) right now in stable releases right? Theminer-fs is also configurable to not be built --disable-miner-fs (I think).

File system indexing is of course important, but only for users who need
it. Like a desktop. A desktop needs it. A phone might not need it. And
if it does, they understandably want to limit its use.

I would like to propose to start with adapting libtracker-extract to be
fully documented, to change tracker_extract_client_get_metadata's API in
such a way that it is truly obvious for a platform builder, integrator
or app developer of for example a MTP daemon to call it in order to get
the file's meta data to be inserted into tracker-store before the MTP
daemon had to write the file itself.

I was under the impression that it was already. If someone is paying forthis or wants patch review, I am happy to step up.

To make it possible to call this on a .tmp-XYZ file for a file that will
later be renamed to Girlfriend.JPEG in the DCIM folder of the phone.

Well, this isn't actually easy to solve even if you move away fromminer-fs. If you're returning the full SPARQL including things like thefile name, size, mtime, etc. then these details change. You eitherchange the SPARQL and wait before injecting it to the store, or postprocess by updating the store details when it changes.

You can't have it both ways. You either want the data early and have tocope with changes like the name changing OR you wait and have the datain it's final (albeit maybe for a small time) state.

Right now this ain't possible, because libtracker-extract is too focused
on being "just a tool library for the filesystem miner".

Well, I would say it's more that the miner-fs is _THE_ only one usingit, so it's not so bad given that.

If you mean to suggest we separate this into a new project, I think thatmight be a good idea. Same for the miner-fs. Possible forlibtracker-sparql too? Some investigation would be needed, there arecore libraries that we depend on in all cases and might cause problems...

One of the recent issues I've had with Tracker is, I can't find it onGoogle - I think Rob mentioned this way back at some GUADEC. The name isquite generic. I have been asked several times why we have so manythings in the tree and if we can disable or split out things. I thinkRedHat recently asked if we could do this, I am sure Debian maintainershave too.

It's not really the Linux way IMO to have everything in one monolithicmodule. So I wouldn't mind splitting things out.

To make language bindings for it like for JS, Dalvik, MonoTouch, Qt.


That would be good. The API is quite small too, shouldn't take much effort.

It ought to be a library for all application developers, just like how
libtracker-sparql is such a library: obvious in API, well documented,
suitable for wrapping it with for example a Qt layer and all that stuff.

:) interesting. There is a reason why it's not a library. We often havecrashes for whatever reason. Sometimes, it's just that the systemlibrary was updated and now our extractor crashes. Sometimes, it'sproblematic files which cause crashes. That's why we use adaemon/program to do extraction, because the people using the extractordon't die. I think making this into a library presents some interestingsituations we would need to consider like that.

I think whoever starts with improving libtracker-extract in this
direction, perhaps by renaming, copying or refactoring to a new library
the API tracker_extract_client_get_metadata, will revive the project to
its original glory.

I don't really view the project as "loosing" it's glory. It's justslowed down, matured even you could say.


--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.

Follow-Ups:
- Re: [Tracker] Reviving the project, a first attempt
  - From: Philip Van Hoof

References:
- [Tracker] Reviving the project, a first attempt
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]