Re: [Tracker] Storing Metadata with files

On Tue, 18 May 2010 10:14:10 -0400, Nikolaus Rath <Nikolaus rath org>
Martyn Russell <martyn-bhGbAngMcJvQT0dZR+AlfA public gmane org> writes:
There seems to be a lot of XMPs though. Are you talking about Does the
file have to have a special file name, or is the name of the
file part of the XMP data itself? Googling for "xmp tracker" or
metatracker" was just as fruitless as searching for "xmp" on the
wiki, so I'm a bit at a loss here...

Yes we are talking about that, we even have some light support in
libtracker-extract. See:

Hmm. To me that looks like I'm actually quite restricted in what
metadata I can put into tracker using XMP sidecar files. Am I right
that I will *not* be able to e.g. insert user defined tags using

Those functions are convenience functions for many extractors (to
avoid code duplication for one). There is nothing to say that can't be
extended, but generally, non-standard data is not represented there
for the moment.

You're the first person to ask for this AFAIK, so there has been no
requirement to anything more than we have now.

I guess that means that I'll have to start coding in C again *sigh*.
What are your thoughts about accepting a patch for such functionality
into the official code base?

What I would like to do is to make it possible to have a tighter
coupling between the indexed files and additional metadata about them.

As I wrote a little while ago, I'm trying to adopt tracker for managing
a document archival system. That means that files are exclusively added
to the indexed paths by a dedicated program that asks the user for
additional metadata and then somehow makes sure that this metadata ends
up in the tracker database as well.

My first attempt was to just put the document into the indexed directory
and then directly add the metadata into tracker using e.g. the dbus API.
I don't like this solution very much for two reasons. First of all, I
have to poll tracker to determine when the new document has been indexed
and I can add the additional metadata.
You could also use class signals to avoid polling. Still, it's not very
practical. There is a wiki page for that though:
dunno if it helps.

Secondly, I don't feel
comfortable with metadata and document to be separated that much. So far
I have the impression that tracker considers metadata to be mostly
transient in the sense that it can always be recovered from the file
itself, and in my case this would no longer be true. I don't have a
particular scenario in mind, but I feel that it's basically asking for
trouble if the simple act of renaming a file (or the indexed directory,
or temporarily changing the tracker index setting)s, would permanently
destroy all the associated metadata (even though no one is supposed to
do anything like that).
Until we get graph support, it will be difficult to differentiate between
extracted metadata and user metadata. And resource removal will be a

Therefore I like the idea of storing the metadata in a separate file.
While this is not foolproof either, I still think that it significantly
decreases the chances of actually loosing the metadata (although it may
become disassociated from the document). It also happens to be that this
is roughly the way system works currently, only that swish-e is used for
indexing he metadata files and that they actually contain the entire
plain text of the document as well.

So far it seems to me that the best approach to get this to work with
tracker would be to either extend the XMP sidecars extractor to extract
more information, or to add an entirely new extractor that reads a
tracker-specific separate metadata file. But maybe there also an
entirely different way to achieve what I want?
What you can do is also write a custom extractor module, that replaces
the stock XMP extractor (using the stock XMP extractor as a starting
point). That gives you freedom to use non standard tags etc.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]