Re: Epiphany Extention Conf [Patch]

From: "D Bera" <dbera web gmail com>
To: "Kevin Kubasik" <kevin kubasik net>
Cc: Joe Shaw <joeshaw novell com>, dashboard-hackers <dashboard-hackers gnome org>
Subject: Re: Epiphany Extention Conf [Patch]
Date: Mon, 20 Mar 2006 22:54:11 -0500

> > > On a similar note, I am trying to figure out a better way to index web
> > > content, the current 'as-viewed' solution is a major resource hog and
> > > significantly slows the viewing of pages (especially some of the
> > > ajaxed ones like Gmai).
> >
> > It should be converted to using the inotify-based indexing service
> > method.  The Firefox extension was moved over some time ago.  It has
> > very little resource overhead and has the benefit of batching up changes
> > when you view web pages but the beagle daemon isn't running.
>
> Since FF 2.0 is using an Sqlite3 db to store all history/bookmark info
> as opposed to the current cryptic mork system, we could look into
> indexing from that.....
> >
> > If you wanted to change it to be off by default and somehow have an
> > "index this page" that would be a nice thing to have for both browsers'
> > extensions.

FF bookmark format isnt based on mork. Its a bad-derivative of xml.
Moreover, I dont think moving over to sqlite3 will help in anything.
The problem for the webhistory backends is as follows:

When a user is viewing a page, it has to be indexed. Data is available
at two places - as disk cache or with the browser.

The Konqueror webhist backend picks up cache files as they are saved
to disk cache - files that arent saved to cache arent indexed;
refreshing a static page doesnt incur any indexing cost, but some of
the webpages konqueror chooses not to save in disk cache and hence
never gets indexed.

For firefox, the plugin traverses the entire DOM document and writes
the Document to a file, which is then indexed. Traversing the DOM can
be resource-heavy for complicated pages. However, the HTML filter then
has to parse the flat file and build the DOM from it. As you see, for
complicated documents, you have to pay the price of DOM traversal at
least once anyway in the HTML filter. In that respect, the FF backend
isnt that bad - it just takes twice the resource for html filter.

Now, currently the metadata information is stored in a mork file,
which is ignored by the FF backend since it is hard to parse. When FF
will start using sqlite to store information, it might contain
interesting metadata like referrer, #visit etc. but the actual html
file will either need to be retrieved from memory or from disk cache.
The memory-approach takes little more time The disk approach takes a
bit less time but the backend has to know which cache file corresponds
to which URL (at the very least). This information is, I think, stored
in the binary Cache/_CACHE_MAP_ file. As long as history.dat (or
sqlite history.db) doesnt contain the mapping between url and cache
file, performance cant be improved.

As Joe suggested, if someone can come up with an idea/implementation
of on-demand indexing of webpages, then it would make life easier.

- dBera

References:
- Epiphany Extention Conf [Patch]
  - From: Kevin Kubasik
- Re: Epiphany Extention Conf [Patch]
  - From: Joe Shaw
- Re: Epiphany Extention Conf [Patch]
  - From: Kevin Kubasik

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]