Re: [Tracker] Storing Metadata with files



On 25/05/10 16:37, Nikolaus Rath wrote:
On 05/25/2010 03:00 AM, Martyn Russell wrote:
What I would like to do is to make it possible to have a tighter
coupling between the indexed files and additional metadata about them.

In what way?

Basically I do not want metadata to be removed just because the indexed
file apparently does not exist at the same place anymore.

Ideally, tracker should e.g. detect if a file has just been renamed and
migrate the existing metadata.

We have been discussing this during the code camp. We have a proposed solution in mind to fix this and will be looking at that in the coming weeks.

Hmm, what causes this for you? That's not expected.

Well, I thought tracker was deliberately designed to do that. Imagine
this situation:

  - A file "letter_to_company" is added the tracker archive folder
  - Additional metadata about "letter_to_company" is added to the
    tracker database with the dbus API
  - Someone wants to "clean up" the archive folder and moves
    "letter_to_company" into a "letters" subdirectory
  - Tracker indexes the "new" "letters/letter_to_company" with the
    metadata that's available in the file
  - Tracker removes all the metadata about the original
    "letter_to_company", because that file no longer exists
  - The metadata added via DBus is now lost irrevocably

Isn't that what's going to happen?

At the moment yes. Also, there is some question as to how persistent the metadata (not added by Tracker but by 3rd parties) should be and how to manage out of date or no longer needed data.

Presumably if the file is moved to another directory we see a MOVE event and we should deal with that correctly. If we see events for the file being deleted and added later to another directory (i.e. we have no relationship between the two) then we have to assume the file is deleted and in that case it invariably makes sense to remove all metadata related to that. Files are generally a different case to other resources.

Therefore I like the idea of storing the metadata in a separate file.

This is far from trivial and we have moved away from separate databases
since 0.6 (for the same type of data) for a number of reasons:

  * Speed
  * Maintainable
  * ...

I agree, conceptually, it makes sense, but the reality is this is much
harder to do. We do use a journal to backup all the data, this would
include your user metadata.

Hm. Interesting. So there is a permanent record of every file that has
ever been added to tracker? Or is this journal expired from time to time?

Currently the journal records all events to be able recreate the database. There is a branch to compress the journal but that's as yet unmerged to master.

How do I access this journal to e.g. obtain the metadata from a deleted
file?

You're not supposed to access it. It is there for us to reproduce the databases in the event of corruption. This is done automatically, there is no user interaction required.

Hmm, a new extractor won't work here. To catch ALL files, you would need
to write a generic one and generic extractors are fallbacks for specific
ones at this point.

I was thinking about a specific extractor just for .xmp files which adds
the extracted metadata to the "real" file. Wouldn't that be possible?

Sure assuming your getting inotify events for .xmp files and the extractor knows how to associate those with the REAL file. But it goes against the principle of the "extractor" to _extract_ not _writeback_.

So we don't recommend it.

Also, the extractor only gets the metadata for that file format, it
doesn't extract or insert the file metadata (size, name, etc).

I don't quite understand.

What I mean is that we just return _embedded_ data, not _all_ the metadata about a file. The miner-fs concatenates the file data (like size, mtime, etc) to the embedded data and sends it to tracker-store.

We would definitely accept a patch to fix this :)>

This sounds a little bit ambiguous :-). To be avoided but you'd accept a
patch?

I mean generally, if an idea makes sense we would accept patches for a fix or implementation of sorts.

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]