Re: Querying expense and weak hashtables



> I've noted in the past that querying a lot is one way to bump up beagled's
> memory usage fairly easily.
>
> There's a discussion going on at the dotLucene forums about a fairly serious
> memory problem in Lucene's field caching:
>
> http://sourceforge.net/forum/forum.php?thread_id=1378460&forum_id=408004

The aforesaid discussion has come to an end. They found a few leaks in
the implementation of IndexSearcher and also fixed them (without using
WeakHashMap).
Curious enough, I patched my local copy with those changes and rebuilt
beagle. Perhaps the patch fixed some memory leaks but beagle querying
is still quite expensive - order of magnitude.

Some observations for querying against the File backend. I started
beagled and gave it a large enough FileSystem root to index. Waited
till it finished indexing. Then stopped the daemon and started it
again. For the rest of the time I didnt touch the files in the root -
so the expense can be attributed solely to querying-expense.
Fired a query with only a few results - vmsize increased by 20MB !
Subsequent queries raise the vmsize based on the following pattern: If
any of the query results are from a different part of the filesystem
tree from what has been reported before, vmsize increases by a few MB.
I suspect this is partly due to some caching involved when
file/directory paths are retrieved by the beagle-query-driver. The
driver has to map uid-s from the hits to actual filesystem uri-s. This
querying is kind of expensive - so there is some caching involved. I
have a hunch that this cache size is growing over queries. One way to
fix this might be by keeping a cache per query - and clearing the
cache when the query is over.

- d.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]