Re: beyond text Re: Attaching Meta-Data
- From: Veerapuram Varadhan <vvaradhan novell com>
- To: Srikant Jakilinki <sriks dcs gla ac uk>
- Cc: Joe Shaw <joeshaw novell com>, dashboard-hackers gnome org, Julian Satchell <j satchell eris qinetiq com>
- Subject: Re: beyond text Re: Attaching Meta-Data
- Date: Thu, 21 Oct 2004 18:46:00 -0700
On Thu, 2004-10-21 at 12:13 +0100, Srikant Jakilinki wrote:
> Of course such things need not apply to photos only. When we use
> clipart
> (and the indexer knows what clipart has been used in a document by
> parsing the office formats), then we can attach-the-metadata that the
> clipart has (like say, "idea" or "mobile-phone") to the document. I
> think it is called attribute-transfer or something. Will check...
>
Well, you got it right!! We currently, if you see in WORD filter, for
example, are not ignoring such information, however, more specific
*such* information will have to be extracted and this sort of
information will give beagle the *heuristic* feature when doing search.
> I am sorry guys if all this is a bit vague but I am very interested in
> Dashboard/Beagle pushing the boundaries of indexing/searching with
> novel
> approaches rather than doing the regular stuff of writing filters for
> every known file format etc.
Even we, beagles, don't want the filters to be a just *text extracting*
programs/functions, rather, we expect them to extract *attributes* of
it.
> Do not get me wrong but filters are
> necessary but I would really like it if we could add more intelligence
> like being able to extract images from PDF documents and then marking
> all other PDF (or office) documents similar if the images extracted
> from
> these are similar (contentwise) as well. For example, manuals or
> articles having logos. I think we should start to look beyond text...
>
Nice thought!! will have to see whether the images buried inside a
document, say PDF, has its meta-data preserved or not!!
Having said that, how about supporting "Thesauraus" search or
"Dictionary search" in Beagle. I don't know whether Lucene supports
this or not, but, would be really powerful.
V. Varadhan.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]