Re: [Tracker] indexing time estimation is broken



On miÃ, 2009-02-04 at 12:41 +0200, Tshepang Lekhonkhobe wrote:
On Wed, Feb 4, 2009 at 12:32 PM, Carlos Garnacho <carlos imendio com> wrote:
On miÃ, 2009-02-04 at 11:53 +0200, Tshepang Lekhonkhobe wrote:
On Wed, Feb 4, 2009 at 11:29 AM, Martyn Russell <martyn imendio com> wrote:
Tshepang Lekhonkhobe wrote:

On Tue, Feb 3, 2009 at 11:04 PM, Martyn Russell <martyn imendio com>
wrote:

Tshepang Lekhonkhobe wrote:

Hi,

Hi :)

Take a few seconds and look at the indexing estimation output of
trackerd logging, and you'll find y in "Indexed x/y" keep changing,
and it really should remain the same once initial crawling is finito
(tried looking at the code but am too weak).

I have noticed this. But actually the statistic reported by the applet
are what you should be more interested in. They are correct. Carlos any
chance we can quickly fix the logging here?

They are the same actually, unless I don't understand what you mean?

They are the same number, but only the correct statistics are ever reported
once the transaction commit has been done. So it is only wrong in the logs
until then. The applet always receives the correct number.

(let's try again)

The Y in X/Y represent the total number of items (indexed or not) and
should therefore not change (except during initial crawling when items
are sent to indexer). The X, is the one that should continuously
change, but tends to remain the same.

What am I missing?

tracker-indexer isn't immediately aware of all items left to index,
trackerd sends them in batches, that explains the Y growing as
tracker-indexer processes files and trackerd hands it more.

Y growth is understandable, but when it gets reduced, that's a problem.

Take into account that Y is just a rough estimation, there can be items
that aren't interesting at all to the indexer, and there are other items
that can contain indexable subelements, such as mail attachments, etc...
We can't know about these beforehand.

However, given the way we calculate total items based on already indexed
and remaining items, it will be less counterintuitive for sure if we
increase X for already known items as I proposed.


About the X, it doesn't really grow if the item was already indexed, I
agree that it's a bit misleading, will look into changing it later.

It's not only misleading, but tremendously skews the estimated time
remaining (since if affect the Y), as in making this feature nearly
meaningless.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]