Re: [Tracker] Request for review, API allowing external software like MTP daemons to get metadata as correct sparql insert



On Tue, 2013-01-08 at 11:40 +0000, Martyn Russell wrote:
On 08/01/13 11:13, Philip Van Hoof wrote:
On Fri, 2013-01-04 at 15:20 +0000, Martyn Russell wrote:
On 03/01/13 21:30, Philip Van Hoof wrote:

Yes the separate update was/is the solution for Nokia's N9. The idea
here or now is that no separate update would be needed at all with this
solution: the MTP daemon does both the extraction and the initial sparql
insert in one pass.

And then the MTP daemon does a rename() call from tempfile to dest-url.
Completing the file transfer.

As long as what's on the disk is represented in the DB, that's fine. At 
the moment we don't have any cases (I know of) where the file is 
available in the DB but not on the disk (or visa versa).

Well no, to be ahead of 'the file is on the disk' is one of the ideas
behind 'better supporting MTP daemons' use-case'.

This can lead to incorrect results for applications using Tracker which 
we want to avoid.

They have tracker:available for that. When a removable media is
disconnected, the file is also not available on the disk. That's why we
have tracker:available.

That sparql insert can happen while the download takes place (allowing
the MTP daemon to insert protocol based metadata before file data is
transferred and available - for extraction).

There are a number of issues with this approach to be considered. For 
example:

1. How does anyone querying the information know when the file is 
ACTUALLY available? tracker:available?

Yes

2. What if you don't have all the information until it's downloaded and 
it's needed for querying ahead of time? This is usually the biggest 
problem here. Simple things like file size or even specific metadata can 
be important enough here.

You don't have all information, but you do have the protocol based
metadata. Often this is sufficient information to display, in certain
use-cases, ahead of the file being on disk.

15. I've not checked the whole on_fileinfo_received() part because it's
really complex in the miner-fs already and we were fixing problems there
for years related to parent existence and creation prior to insertion of
a child path. This worries me about this branch because I get the
impression from quick review that it's not really addressed fully. You
also seem to allow parent URNs to not exist before adding a child to the
database. That's not how we've done things in the miner-fs and this
really breaks what people using the DB can expect. Correct me if I am wrong.

Right, parent URNs aren't yet created with this solution and should
probably be.

This is a requirement I would say. It's a hard problem to solve well 
too, just speak to Sam about his recent work to improve this and 
recursive issues :)

Right.

Ideally Sam puts this in a shared library and not in miner-fs's binary,
so that this MTP support and what Sam is improving can use the same code
in a correct and nice way.

I guess putting INSERT queries upfront the metadata one needs to be
possible as well as getting a separate query for in case the MTP daemon
wants to do multiple passes. ie. First protocol based metadata insert,
which needs the parent URNs at that point and which doesn't yet need the
file content to be available, then extraction based metadata insert; for
which the file content must be available.

Not really. If that insert fails, then you're not guaranteeing the 
parent is inserted, which we do at the moment.

Why would the insert fail other than something being a bug that must be
fixed?

Separate queries for the parent are really the way to go here. Once 
that's done you have to handle failures and retries before the child can 
be added, etc. This is why the miner-fs has a lot of code in this area 
to handle this stuff. Ideally, it would be good to make use of that code 
here so we don't have to fix issues we've already fixed in the miner-fs.

And what this creates must also be compatible with what miner-fs will
otherwise create.

Yea I agree.

The idea here is that while a (large) file is being transferred,
protocol based metadata is already in the Nepomuk store available for
applications that are interested in tracker:available-false 'to be
deployed' data. ie. a DivX or DVD being transferred and a movie
application saying: "DVD with title XYZ with actors ABC and written by
DEF is being transferred to your phone (20% completed)". And After that
an extraction to get extra metadata that isn't included in the protocol
based metadata (and all that without needing miner-fs).

We did actually have support for this in the past with inotify and half 
downloaded files, but Nokia decided not to "notify" about half 
downloaded files after a while. I think this was an area which changed a 
few times due to requirements. In the end performance was one of the 
reasons (due to spamming of updates with everything else going on at the 
same time).

Right. Nokia's requirements might not be everybody else's requirements
though.

And with this miner-fs isn't involved. The MTP daemon has better control
than inotify over the file-transfer to know when to update and when not
to update.

BTW. the '20% completed' does not belong in Nepomuk and needs to be
communicated differently (it's application state information).

Kind regards,

Philip


-- 
Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]