Re: Browser history indexing.



On Mon, 2004-10-18 at 10:12 -0400, Dan Winship wrote:
> On Sat, 2004-10-16 at 01:54 -0400, Nat Friedman wrote:
> > Right now the way that we index your web history is to install a plugin
> > to Firefox or Epiphany and have the browser notify Beagle whenever the
> > user visits a page.  The browser then sends the entire HTML of the page
> > to Beagle.
> > 
> > This works fairly well -- although there are problems with pages that do
> > not load completely, since they never get indexed -- but it does require
> > that the user install this Beagle plugin in the browser, or that we do
> > it automatically for the user, and that it not get removed later.
> > 
> > An alternative is to just monitor the user's browser cache with inotify,
> > and index pages as they hit the cache.  You could check mtime or if
> > necessary cross-reference against the history.db to get the URL and time
> > the page was loaded.
> 
> The problem with working from the cache is that some pages may be
> uncached quickly, depending on the headers sent back from the HTTP
> server. Eg, cnn.com sets all of its pages to expire after 1 minute, so
> that you're constantly getting updated headlines and articles from them.
> So if Beagle was just indexing your cache dir, you would lose the
> ability to search through stuff you'd seen on cnn.com.
> 
> And some pages on some sites are set to not be cached at all (and it's
> entirely possible that Firefox doesn't bother to cache pages to disk if
> they're set to expire too soon as well), so tying indexing into the
> caching mechanism might not be doable even if you immediately copy pages
> out of the cache. (Then again, this is mozilla, and there are probably
> hidden settings you could use to tweak this behavior...)
> 
> 
> Another possibility for indexing web history would be to use a local
> HTTP proxy (a la wwwoffle). But that would only work for http, not
> https, so it's probably no good.
> 
> -- Dan
I think not indexing https is probably highly desirable. You absolutely
don't want say your bank account details held in clear in the search
system.

Julian
> 
> _______________________________________________
> Dashboard-hackers mailing list
> Dashboard-hackers gnome org
> http://mail.gnome.org/mailman/listinfo/dashboard-hackers
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]