Re: A question about filters



Joe Shaw wrote:
> Hi,
> 
> On 4/28/07, Johann Petrak <johann petrak chello at> wrote:
>> Is it possible to have multiple filters for a a group of files
>> where each provides their own part of (meta)information about
>> the files?
> 
> Yeah, although this requires some coding.  Each item that's to be
> indexed by Beagle is called an "indexable".  There is a concept of
> "child indexables", which are generated by filters when processing an
> indexable.  Two examples of these are archives and email messages.
> The archive filter creates child indexables for each file contained in
> the archive.  The email filter creates child indexables for different
> parts of an email message (usually attachments).
> 
> You would need to create a filter which can parse whatever container
> format your data is in, and generate indexables for each chunk inside.
> 

Sorry I think there was a misunderstanding ... I probably expressed this
in the wrong way.
What I want is really less complex: I simply want that more than one
filter runs for certain types of files (or all files processed by
beagle).
For example: there is now running a filter for pdf files and another
for openoffice writer documents. I have information in a database about
both types of files and therefore want to run another filter for
both pdf and OO Writer documents that would *add* additional
metadata about these files to the beagle index.
My question is whether this is possible (and how).

>> How would a use then be able to query for files using those
>> additional properties?
> 
> Beagle is all URI-based, so you would need to generate a unique URI
> for this.  The convention is usually the parent URI with an anchor on
> the end (file:///home/joe/foo.zip#bar.txt).  Searches which match this
> indexable will return this file, with its own metadata and its
> parent's metadata.  How that gets presented/opened to the user depends
> a lot on the UI.

Again my question was simpler I think: I just was wondering about this:
if I manage to add additional metadata about those PDF and OOWriter
files in the way mentioned above (e.g. if I add a field "copyright
holder") how would the end-user query for that information
(i.e. how would the user query for all documents that have a
specific copyright holder)?

Cheers,
  Johann



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]