Re: [Tracker] Storing Metadata with files

On Tue, 2010-05-18 at 10:14 -0400, Nikolaus Rath wrote:
Martyn Russell <martyn-bhGbAngMcJvQT0dZR+AlfA public gmane org> writes:
You're the first person to ask for this AFAIK, so there has been no
requirement to anything more than we have now.

I guess that means that I'll have to start coding in C again *sigh*.
What are your thoughts about accepting a patch for such functionality
into the official code base?

We would definitely accept a patch to fix this :)

What I would like to do is to make it possible to have a tighter
coupling between the indexed files and additional metadata about them.

In what way?

As I wrote a little while ago, I'm trying to adopt tracker for managing
a document archival system. That means that files are exclusively added
to the indexed paths by a dedicated program that asks the user for
additional metadata and then somehow makes sure that this metadata ends
up in the tracker database as well.

I see.

My first attempt was to just put the document into the indexed directory
and then directly add the metadata into tracker using e.g. the dbus API.
I don't like this solution very much for two reasons. First of all, I
have to poll tracker to determine when the new document has been indexed
and I can add the additional metadata. 

You should be able to rely on notifications to know when files are added
to the system. See:

Secondly, I don't feel
comfortable with metadata and document to be separated that much. So far
I have the impression that tracker considers metadata to be mostly
transient in the sense that it can always be recovered from the file
itself, and in my case this would no longer be true. 

That's not how we consider it. User metadata is also important. It does
have certain restrictions with our current model though (such as > 1
application writing the same data about the same resource means the data
is overwritten each time - for example).

I don't have a
particular scenario in mind, but I feel that it's basically asking for
trouble if the simple act of renaming a file (or the indexed directory,
or temporarily changing the tracker index setting)s, would permanently
destroy all the associated metadata (even though no one is supposed to
do anything like that).

Hmm, what causes this for you? That's not expected.

Therefore I like the idea of storing the metadata in a separate file.

This is far from trivial and we have moved away from separate databases
since 0.6 (for the same type of data) for a number of reasons:

 * Speed
 * Maintainable
 * ...

I agree, conceptually, it makes sense, but the reality is this is much
harder to do. We do use a journal to backup all the data, this would
include your user metadata.

While this is not foolproof either, I still think that it significantly
decreases the chances of actually loosing the metadata (although it may
become disassociated from the document). 

So does the journal. I think the journal is enough too.

So far it seems to me that the best approach to get this to work with
tracker would be to either extend the XMP sidecars extractor to extract
more information, or to add an entirely new extractor that reads a
tracker-specific separate metadata file. But maybe there also an
entirely different way to achieve what I want?

Hmm, a new extractor won't work here. To catch ALL files, you would need
to write a generic one and generic extractors are fallbacks for specific
ones at this point.

Also, the extractor only gets the metadata for that file format, it
doesn't extract or insert the file metadata (size, name, etc). So we
would have to provide some solution for you to do this properly. I don't
think it makes sense either. This would mean much data duplication in
user space and that should be avoided where possible.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]