Re: Cluechaining and Original Clues



> Hi all,
>
> Coming back to an oldish email because of some behaviour I've noticed
> today while playing with Dashboard.  At this stage when cluechaining
> occurs, the old Clues are thrown out and replaced by a brand new
> CluePacket containing only the new chained Clues.

Yep, see below for some more details and ideas.

> On Sun, 2003-12-21 at 05:41, Jim McDonald wrote:
>
>> Okay, so from that I'd say that the backends should either receive a
>> single cluepacket with all of the chained clues, or multiple
>> cluepackets each with their own separate set of clues.  The current
>> system of sending multiple cluepackets with overlapping sets of clues
>> seems to be redundant.
>
> Yes and no.  The reason for adding new clues to the CluePacket and
> resending it is if new information becomes available for a backend that
> uses several Clues.  An example of this is backend-foaf.cs
> which uses both the rdfurl and foafid Clues to find useful information.
>
> It just happens that at the moment they both are created by
> backend-htmlchainer while looking at content, but that doesn't have to
> be the case.
>
> I was just looking through the logs of running Dashboard, and noticed a
> possible option would be to find dates, and then look for Tasks and
> Calendar entries on that date for full_name and/or nicknames.
>
> Or files modified on that date which have matching keywords.

Yeah I can see what you're after here. and can see why it might be useful.
 I did run in to issues with some cluechaining looping forever, though,
and lots of duplicate results, which is why I ditched it in the first
place.  I also think that this is a less common requirement than just
iterating over the clues and doing something relatively simple and
standalone with each clue.
>> > Of course, there's nothing concrete about the design, and if anyone
>> > comes up with a better, more elegant way to deal with it, we'll run
>> > with it.
>> Well we can either make chainers separate from backends, where the
>> cluepacket goes through all chainers before it goes to the backends,
>> or we can leave things as they are but then make cluechainers strip
>> out the original clues.
>
> In my opinion cluechainers should only ever add Clues.  I actually like
> this idea of running chainers first.  But, what happens when  chainers
> use the output of other chainers?
>
> Also, what about backend-addressbook?  It both chains new clues, and
> produces results.  Breaking it into two modules is an option, but
> there'll be a lot of shared code...

I've pretty much dismissed the idea of cluechainers running first, due to
the above issues.  Think this one goes down as a 'nice idea, unworkable in
practice'.
>> > As for duplicates, I think they're fine.  There was supposed to be a
>> > result filtering class which would remove duplicates and cull out
>> > results with lower relevances.
>> I still think that straight duplicates are a waste of time and
>> effort.  Most backends don't keep their own cache, so especially for
>> network-heavy backends this could be a severe waste of resources as we
>> up the number of cluechainers and backends.
>
> There's already checks to make sure that duplicate Clue's aren't added
> to a CluePacket.  And there used to be a check to make sure that no
> duplicate HTML snippets were displayed - this doesn't appear to work
> anymore.

There are checks to ensure that the same resultset isn't run twice, but if
you put back the bit of code that keeps clues in chained cluepackets then
this won't work any more as it will see the inputs as different.  If this
isn't working without any changes to the engine code then that's a bug.
> With regards to backends duplicating work, I think the base class
> should probably be extended to contain a cache.  That way those
> backends that will make use of the additional Clues can, those
> that don't, well, don't do any extra work.

Caches aren't a way of solving the issue, as you then never know if you're
getting stale data or not.  It does make sense to have a cache available
for each backend, but considering how specialised the backends are
already, and what they might be doing in the future, it should be down to
the backend as to if it wants to use a cache or not rather than having
this forced upon it.
   Having had a quick think about this, each cluepacket knows about its
   parent, right?  So why not have a method in CluePacket that gets all
   clues, including ones from its parent(s).  That way any backend that
   wants to do clever things with the complete list of original and
   chained clues can, and the other backends that just take each clue
   separately will avoid most duplicate work.
> Cheers!

Cheers,
Jim.
-- 
Jim McDonald - Jim mcdee net





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]