Re: Browser history indexing.
- From: Julian Satchell <j satchell eris qinetiq com>
- To: Dan Winship <danw novell com>
- Cc: dashboard-hackers gnome org
- Subject: Re: Browser history indexing.
- Date: Mon, 18 Oct 2004 15:10:47 +0100
On Mon, 2004-10-18 at 10:12 -0400, Dan Winship wrote:
> On Sat, 2004-10-16 at 01:54 -0400, Nat Friedman wrote:
> > Right now the way that we index your web history is to install a plugin
> > to Firefox or Epiphany and have the browser notify Beagle whenever the
> > user visits a page. The browser then sends the entire HTML of the page
> > to Beagle.
> >
> > This works fairly well -- although there are problems with pages that do
> > not load completely, since they never get indexed -- but it does require
> > that the user install this Beagle plugin in the browser, or that we do
> > it automatically for the user, and that it not get removed later.
> >
> > An alternative is to just monitor the user's browser cache with inotify,
> > and index pages as they hit the cache. You could check mtime or if
> > necessary cross-reference against the history.db to get the URL and time
> > the page was loaded.
>
> The problem with working from the cache is that some pages may be
> uncached quickly, depending on the headers sent back from the HTTP
> server. Eg, cnn.com sets all of its pages to expire after 1 minute, so
> that you're constantly getting updated headlines and articles from them.
> So if Beagle was just indexing your cache dir, you would lose the
> ability to search through stuff you'd seen on cnn.com.
>
> And some pages on some sites are set to not be cached at all (and it's
> entirely possible that Firefox doesn't bother to cache pages to disk if
> they're set to expire too soon as well), so tying indexing into the
> caching mechanism might not be doable even if you immediately copy pages
> out of the cache. (Then again, this is mozilla, and there are probably
> hidden settings you could use to tweak this behavior...)
>
>
> Another possibility for indexing web history would be to use a local
> HTTP proxy (a la wwwoffle). But that would only work for http, not
> https, so it's probably no good.
>
> -- Dan
I think not indexing https is probably highly desirable. You absolutely
don't want say your bank account details held in clear in the search
system.
Julian
>
> _______________________________________________
> Dashboard-hackers mailing list
> Dashboard-hackers gnome org
> http://mail.gnome.org/mailman/listinfo/dashboard-hackers
>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]