Re: Querying expense and weak hashtables

From: D Bera <dbera web gmail com>
To: Daniel Drake <dsd gentoo org>
Cc: dashboard-hackers gnome org
Subject: Re: Querying expense and weak hashtables
Date: Sat, 26 Nov 2005 12:08:28 -0500

> I've noted in the past that querying a lot is one way to bump up beagled's
> memory usage fairly easily.
>
> There's a discussion going on at the dotLucene forums about a fairly serious
> memory problem in Lucene's field caching:
>
> http://sourceforge.net/forum/forum.php?thread_id=1378460&forum_id=408004

The aforesaid discussion has come to an end. They found a few leaks in
the implementation of IndexSearcher and also fixed them (without using
WeakHashMap).
Curious enough, I patched my local copy with those changes and rebuilt
beagle. Perhaps the patch fixed some memory leaks but beagle querying
is still quite expensive - order of magnitude.

Some observations for querying against the File backend. I started
beagled and gave it a large enough FileSystem root to index. Waited
till it finished indexing. Then stopped the daemon and started it
again. For the rest of the time I didnt touch the files in the root -
so the expense can be attributed solely to querying-expense.
Fired a query with only a few results - vmsize increased by 20MB !
Subsequent queries raise the vmsize based on the following pattern: If
any of the query results are from a different part of the filesystem
tree from what has been reported before, vmsize increases by a few MB.
I suspect this is partly due to some caching involved when
file/directory paths are retrieved by the beagle-query-driver. The
driver has to map uid-s from the hits to actual filesystem uri-s. This
querying is kind of expensive - so there is some caching involved. I
have a hunch that this cache size is growing over queries. One way to
fix this might be by keeping a cache per query - and clearing the
cache when the query is over.

- d.

References:
- Querying expense and weak hashtables
  - From: Daniel Drake

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]