Re: Indexing License Metadata


On 6/29/07, Jason Kivlighn <jkivlighn gmail com> wrote:
I'm interested in extending existing filters to extract license
metadata.  The attached patch demonstrates extracting Creative Commons
license metadata from SVGs, XMP embedded in images and PDFs, and OLE files.

This seems like a good idea.  It seems to me like the license info
isn't likely to be normalized across all of the different file types.
It might make sense to do that while indexing, or in the user
interfaces.  That way there is a consistent presentation of the
license info.

However, I don't know what to do with the licenses extracted.  Does a
new property need to be added?  How do I get the license back out after
indexing a file with a license?

Yeah, a new property is the way to go.  You'll have to add it for each
filter or backend that you modify, and I'd suggest using a consistent
key name across all of them (see above).

As long as you don't flag the property not to be stored, it will be
stored in the index and retrieved at search time.  You just use an
accessor on the result to get it, ie:

 hit ["fixme:license"];

(where hit is an object of type Hit)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]