Re: Cluechaining and Original Clues



Hi,

On Sat, 2003-12-20 at 05:13, Jim McDonald wrote:
> Hi,
>    I've been looking at cluechainig and have been trying to work out
> what it is meant to do, and what it does in today's implementation. 

The idea behind cluechaining is to enhance the results of a query and
return additional clues which are similar.  For example, say you
received an instant message from me, and my screen name is "FooBar". 
"FooBar" isn't very useful by itself, you would probably only get old IM
logs as hits for that clue.  However, if we have an address book backend
which knows that "FooBar" corresponds to "Joe Shaw", it can then chain a
bunch of new clues which are relevant to the current context, which is
that you're chatting with me.  So with that information, you'd probably
get my blog, documents and photos from me, etc.

In fact, the screenshot on the dashboard page:

        http://www.nat.org/dashboard/rewrite.png
        
shows exactly this situation in action.

Unfortunately, I don't remember how it's all implemented today.  I
haven't worked on dashboard since the summer, and we tried a bunch of
different things for cluechaining.  It was one of the hardest parts we
had to do so far.

> It appears that the backends which generate new clues don't remove the
> old ones from the packets, which means that some backends will return
> the same information twice (once for the original clue in the original
> cluepacket, and again for the same informatio in the duplicate
> cluepacket).  Is this meant to be the way that this works, or should
> duplicate cluepackets from chainers remove the original clues?

I might be remembering wrong, but I am pretty sure they are supposed to
be there, since there's the possibility that a backend might not get the
original cluepacket but might instead only get the rewritten one.  We
might have changed that, though.  It also allows backends to know about
the previous clues, so it can return cached information or just ignore
them if it knows it's dealt with them in the past.  There was also the
plan to add a maximum chain depth, but I don't know if that was ever
implemented.  I don't think it ever really came up, since we had
relatively few chaining backends.

Of course, there's nothing concrete about the design, and if anyone
comes up with a better, more elegant way to deal with it, we'll run with
it.

As for duplicates, I think they're fine.  There was supposed to be a
result filtering class which would remove duplicates and cull out
results with lower relevances.

I know it doesn't really answer your questions concretely, but I hope it
helps. :)

Joe

Oh, BTW, Nat is on vacation for 3 weeks so he won't have access to email
and won't be able to do a release.  If I talk to him before he gets back
I'll ask him about making one.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]