using Beagle's index Re: beyond text Re: Attaching Meta-Data
- From: Srikant Jakilinki <sriks dcs gla ac uk>
- To: Julian Satchell <j satchell eris qinetiq com>
- Cc: darryl vandorp ca, dashboard-hackers <dashboard-hackers gnome org>
- Subject: using Beagle's index Re: beyond text Re: Attaching Meta-Data
- Date: Wed, 27 Oct 2004 06:04:40 +0100
Yes. I agree with Julian. This synonym expansion has not been shown to
improve search effectiveness. But it is a worthy project Veer and we
could have a small checkbox which should allow the user to do this
rather than by default. Comments, Julian?
But here is yet another suggestion. We should make use of Beagle's index
for many other purposes depending on what dimensions a document is
similar to another. For example, a document can be similar to another
document in the dimension of text (same tokens). Or in the dimension of
metadata (same author). Or in the dimension of looks (screen rendering
of papers). Or in the dimension of how it sounds (signal waveform). Or
in terms of people (instant messaging, email CC's). Or what other
documents are included within (clipart insertion). When such a match
happens, the cluster of these similar documents should be having similar
tokens "attached" to them. Atleast the top ranking ones for these set of
documents. In other words, the Beagle indexer not only extracts the
tokens from documents and places them in an inverted index, but it does
some crude matching as well. It should keep working in the background
silently trying to make sense of all the chaos that we are generating by
going about our tasks. No AI here folks. Just simple multi-dimensional
For example, suppose we have some photos and at some point we have
attached tags like "picnic", "hiking" to them. All the photos in this
batch have similar specs (look the same or have close timestamps or same
EXIF information). The tags should be shared among them. Further, if in
the future we have similar photos (in terms content or EXIF) coming in,
the Beagle indexer should catch them and tell the user of the tags that
could be potentially attached to the new ones. It is Beagle's index who
has the knowledge of what document(s) has what tags/tokens. Let me try
to make some screenshots if this is not clear.
On Mon, 2004-10-25 at 13:05, Julian Satchell wrote:
> You are talking about synonym expansion.
> This needs more than a standard ispell/aspell type dictionary, as you
> need the semantic relationships of words. The only big, not-paid-for,
> dataset like this that I know of is WordNet, which is english; I think
> there are efforts to make similar sets for some European languages.
> It is only rarely useful; many search engines used it in the late 90s,
> but it is not now normally turned on. It mildly increases recall, but
> vastly reduces relevance; most searches return too many items anyway, so
> this is not often wanted.
> In many cases, a good thesaurus tool (a front end to WordNet?) will
> allow you to do the expansion youself.
> On Mon, 2004-10-25 at 16:16 -0700, Veerapuram Varadhan wrote:
> > On Thu, 2004-10-21 at 10:05 -0500, darryl vandorp wrote:
> > > >
> > > > It would rock if it used aspell or some other dictionary that wasn't
> > > > online. Setting up a query driver for the google spell checker would
> > > > be a good start though.
> > > >
> > > > -- joe g.
> > > >
> > > >
> > > There's also the dict protocol I don't know if there's a c-sharp
> > > library for that somewhere.
> > >
> > > -darryl
> > hmmm.. I think I was not clear in my previous post. Well.. what I meant
> > by "Dictionary search" was:
> > * search the "beagle-backend" for the "synonyms" of the user entered
> > "query word".
> > For example: "There has been a serious debate/disagreement on a
> > particular feature being implemented in a tool and lot of mails, docs,
> > chat logs are available as a record. Now, if user wants to search on it
> > and he doesn't know the exact word but knows to the extent that 'there
> > was a debate/dispute'.".
> > In such scenarios, he can very well say "dispute" and select
> > "Dictionary search" and he gets a hit list that satisfies:
> > * Docs, mails, chat-logs, web pages, etc., that contain the keyword
> > "dispute" or "debate" or "disagreement" or "quarrel" or "argumentation"
> > or "discussion" etc.
> > Maybe that the example that I stated above is not good, but, I just
> > wanted to explain what I meant by "Dictionary" search.
> > Cheers,
> > V. Varadhan.
> > _______________________________________________
> > Dashboard-hackers mailing list
> > Dashboard-hackers gnome org
> > http://mail.gnome.org/mailman/listinfo/dashboard-hackers
> Dashboard-hackers mailing list
> Dashboard-hackers gnome org
" " - Sriksisms ~powered by~ TagZilla
] [Thread Prev