Re: [Tracker] Moving crawling to the indexer

From: Carlos Garnacho <carlos imendio com>
To: Martyn Russell <martyn imendio com>
Cc: Tracker-List <tracker-list gnome org>
Subject: Re: [Tracker] Moving crawling to the indexer
Date: Tue, 11 Nov 2008 15:11:12 +0100

Hi!

The main problem I see with keeping the status in a single place is that
there are for example reasons why the indexer would want to be paused a
while, but not have the daemon (nor apps on top of it) notified of it,
for example when it's flushing data to disk, and it's feasible that we
end up adding some more situations like this.

So that kind of pushes us to still keep certain aspects of "indexing
state" in the indexer, as the daemon certainly shouldn't control that
IMHO, such state is quite inherent to the indexer, and also would render
standalone testing useless :)

On lun, 2008-11-10 at 16:45 +0000, Martyn Russell wrote:

Hi all,


PREFACE:
========

So I have been looking at one of the remaining issues we have before
releasing the next version of Tracker upstream. As Jamie outlined it it is:

  1) enumerating and crawling directories needs to be done in the
  indexer (and pass directories to watch back to the daemon). Daemon can
  then run is nice 0 and normal ionice instead of nice 19 as only cpu/io
  heavy ops will be searches and queries which need to be fast as
  possible


THE PROBLEM:
============

I have spent close to a week trying to do this and finding out how easy
it will be to do. And it isn't as easy as I first thought.

To summarise, the problem is essentially where all the status handling
is done (and avoiding duplication of code) and the inter-process
communications.

After discussing this with Carlos this morning, we agreed to put it to
the mailing list to get some feedback about the way forward on this one.


THE IDEA:
=========

My idea was that the daemon, which is _ALWAYS_ running, should hold all
state information and the indexer shouldn't since it is temporary, not
always running. By state information, I mean:

- Are we in READONLY mode
- Are we running?
- Are we indexing for the first time?
- Are we merging?
- Are we paused manually (by the user)?
- Are we paused for IO (due to Monitor events).
- Is the disk full?
- Is the battery low?
- What state are we in generally (i.e. INDEXING, IDLE, etc).
- How many items have we processed?
- How long have we been indexing?
- How long before we are finished (estimated)?

There could be a "State" DBus interface to query which would super seed
the current status APIs after the next release.

There would be 2 modules, one for all the state handling and monitoring
and one for the actual states the application can be in (INDEXING, IDLE,
etc). The later could possibly be shared in libtracker-common (see below
where I discuss this more).


HOW IT WORKS NOW:
=================

All of these things are pushed up to the user in some form or another.
Right now they are split over the indexer and the daemon and the indexer
currently propagates some of them up, others are kept in the daemon already.

The indexer itself ALSO needs a state machine if we move the processor
into it. So it can know if it is crawling, indexing, merging, idle, etc.


ADVANTAGES:
===========

- Not having to start the indexer to know that the disk is full and it
will just exit or sit there idle for 300 seconds.
- Not having to start the indexer to realise the battery is low and we
won't be indexing anyway.
- State will all be in one place. This should minimise the amount of IPC
traffic to keep up to date with the indexer's reasoning for running or
not running and its state.


DISADVANTAGES:
==============

- The indexer wouldn't have any checks in place if it ran stand alone,
it would depend on the daemon telling it to stop or pause if (say) the
disk space runs out or we are low on battery.
- The daemon would have to have to control when the indexer is
paused/stopped due to a few more states - that would need working on.


OTHER IDEAS:
============

There has been some thought that we could have a TrackerState in
libtracker-common. I am not sure it really fits to put this somewhere
centrally. There are some shared states the indexer and daemon would
have, but not all are necessary for both. Having the daemon's states in
the common library might be good for the applet or other applications to
know what the status ID means when they use a DBus API (or perhaps the
libtracker API).


CONCLUSION:
===========

The work to put the crawler in the indexer is not going to be as quick
as I had thought. I think it would take about 2 or 3 weeks for 2 people
(Carlos and I) to implement, test and be happy with. For that, we would
start a new branch.


COMMENTS:
=========

What are people's thoughts about how we maintain state here?

-- 
Carlos Garnacho
Imendio AB - Expert solutions in GTK+
http://www.imendio.com

References:
- [Tracker] Moving crawling to the indexer
  - From: Martyn Russell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]