Re: beyond text Re: Attaching Meta-Data



On Thu, 2004-10-21 at 12:13 +0100, Srikant Jakilinki wrote:
> Of course such things need not apply to photos only. When we use
> clipart
> (and the indexer knows what clipart has been used in a document by
> parsing the office formats), then we can attach-the-metadata that the
> clipart has (like say, "idea" or "mobile-phone") to the document. I
> think it is called attribute-transfer or something. Will check...
> 
Well, you got it right!!  We currently, if you see in WORD filter, for
example, are not ignoring such information, however, more specific
*such* information will have to be extracted and this sort of
information will give beagle the *heuristic* feature when doing search.

> I am sorry guys if all this is a bit vague but I am very interested in
> Dashboard/Beagle pushing the boundaries of indexing/searching with
> novel
> approaches rather than doing the regular stuff of writing filters for
> every known file format etc.
Even we, beagles, don't want the filters to be a just *text extracting*
programs/functions, rather, we expect them to extract *attributes* of
it.

> Do not get me wrong but filters are
> necessary but I would really like it if we could add more intelligence
> like being able to extract images from PDF documents and then marking
> all other PDF (or office) documents similar if the images extracted
> from
> these are similar (contentwise) as well. For example, manuals or
> articles having logos. I think we should start to look beyond text...
> 
Nice thought!! will have to see whether the images buried inside a
document, say PDF, has its meta-data preserved or not!!

Having said that, how about supporting "Thesauraus" search or
"Dictionary search" in Beagle.  I don't know whether Lucene supports
this or not, but, would be really powerful.

V. Varadhan.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]