Re: Getting started with beagle



Hi,

On Thu, Feb 14, 2008 at 12:03 AM, Debajyoti Bera <dbera web gmail com> wrote:
> > An architectural decision to be made, do we want to actually index the
>  > data off of every webservice, or just offer 'transparent' backends to
>  > query the existing query API's for each service. I'm more for a local
>
>  A transparent "proxy" backend to query using webservice API (in beagle lingo,
>  a "QueryDriver") is fine for some kind of data but ideally a real backend
>  that fetches the data and indexes it ("backend") would be the best option.

Yeah, I agree.  If the data is reasonably small such that we can pull
it down and index it, and it doesn't violate the service's terms of
service, we should do that.  Searches will be quicker in most cases,
they work while offline, etc.

And the most important reason is that local desktop searches don't
leak out onto the internet.  This is the large reason why we have
"query domains" in Beagle; you don't necessarily want your searches
for "top secret plans to defeat Google and Facebook" to also go out
and hit Google's and Facebook's servers.  There's a privacy issue
here.  It's also one that's not addressed currently in the user
interfaces, and it would be a prerequisite to any such backend going
in.

>  An out-of-process script will work but it is really not that complicated to do
>  this in process. All you have to do is create an IndexableGenerator and feed
>  indexables as asked in GetNextIndexable. Depending on how fast the data can
>  be accessed from the webservice, either download some 30/40 "indexables" from
>  the webservice in HasNextIndexable or use a separate thread to download them
>  and put in a shared queue from which GetNextIndexable will get them.

Yeah, and we can add additional controls in the scheduler if that is needed.

>  If you do it out of process, make sure you dont choke the internet by
>  downloading all 10K emails in one go i.e. you can't ignore some kind of
>  scheduling.

Agreed.  You already have a scheduling system inside Beagle.  You're
better off using it than writing your own.  (Sometimes you can't avoid
it, like the Thunderbird backend running inside TBird itself.)

Joe


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]