Re: [Tracker] Reviving the project, a first attempt



On 11/22/2012 09:01 AM, Philip Van Hoof wrote:
Guys,

Hello Philip,

Buried under its own weight of complexity the project is stifled. Why do
I think this?

The project isn't dead. I should point this out. It's slowed up due to the change in funding clearly.

What we tried to achieve during the Maemo Fremantle and MeeGo Harmattan
years at Nokia was so complicated to sell and complicated to develop
that most of the lead developers who got involved in the development of
the project from vstore to master became what I call "expensive people"

* The vstore branch, find this in the mailing list:
                               From:
JÃrg Billeter <j bitron ch>
                            Subject:
Re: [Tracker] vstore to master
                               Date:
Thu, 16 Apr 2009 18:15:27 +0200

I always wondered, JÃrg, where the name vstore came from? But it was a
fantastic branch and piece of work that you did. It clearly steered the
project in the direction of SPARQL and Nepomuk as ontology. Thanks.

I don't recall what the branch was about actually. At the hight of our development, I was merging ca. 6 branches a week into master. Hard to keep up with all of them ;)

The SPARQL and Miner components both became complex and at the same time
intertwined. I regret them being intertwined and I always wanted the
SPARQL endpoint tracker-store to be a different project than the Miner
projects and the Extractor project:

        Although I don't like the overkill design of Nepomuk-KDE, I like
        how Jos van den Oever kept libstreamanalyzer and strigi and
        separate from the Nepomuk-KDE world.

Opinions in the team admittedly differed on that. I never raised my
voice on this issue because I thought that the team had to stay together
to suffer, as a team, the burden of delivering what we delivered for
Harmattan MeeGo: The adhesion between Lanedo, Codethink, Codeminded and
for the Qt libraries like qtcontacts-tracker OpenIsmus was too important
for delivering, that splitting up the project wasn't a worthwhile risk.

This is not the case anymore. And I heard from developers of a new phone
OS being developed that Tracker is again used, that it was again a hard

Which one?

sell internally, but I didn't expect anything less as our project was an
extremely hard sell within the Harmattan team at Nokia too, and that
they aren't dissatisfied with it. Even satisfied. Wow!

:)

I would also like to thank our top contributors and the people who
worked on Qt based libraries built on top of libtracker-sparql for
spreading the truth about our team and Tracker. You guys know who you
are, I don't have to name you ;-)

Don't forget your input. You made quite a sizeable contribution and made quite some difference. ;)

And oh my God I'm writing so much text just to make a simple point ..

It's definitely not an imposter writing this then :P

The API libtracker-extract's tracker_extract_client_get_metadata is not
public enough because the Tracker is relying too heavy on the file
system miner. Today it is time to change this.

I agree that it's too heavily relying on the miner.

There is a good reason for this. The filesystem information, name, size, mtime, etc is all handled by the miner-fs. You could likely solve this issue by "chaining" extractors and have a basic file extractor which gets this information so the miner isn't doing it.

This is the reason why miner-fs is injecting SPARQL, because it concatenates extractor specific SPARQL with file system general SPARQL.

Phone builders want to rid themselves of file system mining. Instead
they want to let MTP daemons, who deal with incoming files, do the
processing and extraction of file meta data. They don't want to
configure with DConf or a GKeyFile to point to a directory where the MTP
daemon will write files, at all.

Right, there are different ways which data can come in and we shouldn't restrict ourselves to the filesystem. That makes sense. But we don't. It just so happens the miner-fs is the main way people get data into tracker-store.

Instead they want their MTP daemon to use a simple API that will trigger
tracker-extract into extracting the file and then writing the SPARQL
INSERT to the SPARQL endpoint.

One of the things I would love to see happen (or add) is a command line option or way to inject SPARQL from tracker-extract into the store. We have a hack for this right now with tracker-control -f $FILE and a dbus API. The main problem with this is that the filesystem data is not there for files (which are the main use case in tracker right now).

I think we mis designed this because we thought too much that the entire
world of Nepomuk, inter app data sharing and meta data in general
necessarily depends on file system indexing. This is just not true.

Things move on.

Tomorrow's phone builders might not even use a file system. Why would
inter app data sharing then necessarily depend on file system indexing?!

You know that the miner-fs doesn't have to be a daemon and can index on demand (instead of by inotify) right now in stable releases right? The miner-fs is also configurable to not be built --disable-miner-fs (I think).

File system indexing is of course important, but only for users who need
it. Like a desktop. A desktop needs it. A phone might not need it. And
if it does, they understandably want to limit its use.

I would like to propose to start with adapting libtracker-extract to be
fully documented, to change tracker_extract_client_get_metadata's API in
such a way that it is truly obvious for a platform builder, integrator
or app developer of for example a MTP daemon to call it in order to get
the file's meta data to be inserted into tracker-store before the MTP
daemon had to write the file itself.

I was under the impression that it was already. If someone is paying for this or wants patch review, I am happy to step up.

To make it possible to call this on a .tmp-XYZ file for a file that will
later be renamed to Girlfriend.JPEG in the DCIM folder of the phone.

Well, this isn't actually easy to solve even if you move away from miner-fs. If you're returning the full SPARQL including things like the file name, size, mtime, etc. then these details change. You either change the SPARQL and wait before injecting it to the store, or post process by updating the store details when it changes.

You can't have it both ways. You either want the data early and have to cope with changes like the name changing OR you wait and have the data in it's final (albeit maybe for a small time) state.

Right now this ain't possible, because libtracker-extract is too focused
on being "just a tool library for the filesystem miner".

Well, I would say it's more that the miner-fs is _THE_ only one using it, so it's not so bad given that.

If you mean to suggest we separate this into a new project, I think that might be a good idea. Same for the miner-fs. Possible for libtracker-sparql too? Some investigation would be needed, there are core libraries that we depend on in all cases and might cause problems...

One of the recent issues I've had with Tracker is, I can't find it on Google - I think Rob mentioned this way back at some GUADEC. The name is quite generic. I have been asked several times why we have so many things in the tree and if we can disable or split out things. I think RedHat recently asked if we could do this, I am sure Debian maintainers have too.

It's not really the Linux way IMO to have everything in one monolithic module. So I wouldn't mind splitting things out.

To make language bindings for it like for JS, Dalvik, MonoTouch, Qt.

That would be good. The API is quite small too, shouldn't take much effort.

It ought to be a library for all application developers, just like how
libtracker-sparql is such a library: obvious in API, well documented,
suitable for wrapping it with for example a Qt layer and all that stuff.

:) interesting. There is a reason why it's not a library. We often have crashes for whatever reason. Sometimes, it's just that the system library was updated and now our extractor crashes. Sometimes, it's problematic files which cause crashes. That's why we use a daemon/program to do extraction, because the people using the extractor don't die. I think making this into a library presents some interesting situations we would need to consider like that.

I think whoever starts with improving libtracker-extract in this
direction, perhaps by renaming, copying or refactoring to a new library
the API tracker_extract_client_get_metadata, will revive the project to
its original glory.

I don't really view the project as "loosing" it's glory. It's just slowed down, matured even you could say.

--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]