Re: [Tracker] Indexers comparison



2007/1/18, Michal Pryc <Michal Pryc sun com>:
Luca Ferretti wrote:
> Il giorno mer, 17/01/2007 alle 11.15 +0000, Michal Pryc ha scritto:
>> Hello,
>>
>> I am sending *small* comparison of the indexers. This might start some
>> discussion on many aspects of the indexing tools, to get better ones.
>
> This is a really good job, thank you so much.
>
>> In the future it would be nice to see common search query standard, a
>> lot of work have been done on it so far, but also common plugins for
>> indexing would increase number of supported formats significantly, so we
>> have plenty of work to do :-)
>
> BTW, could you put somewhere the test data set? It's about 830 MB, but
> could be interesting provide it for testing and development purpose,
> isn't it?
>
> For example, the Tracker's problems indexing text files could be checked
> and hopefully fixed using the same test case (I don't have 10,000 text
> files).
>

Unfortunatelly this is rather hard to do for me, because in the data set
there were some documents that might be for internal use only, so it
would be very time consuming to select the "proper" ones :-(

But maybe it is time to create one good set of documents that people can
freely use for testing the indexers.

Maybe some wikipedia dumps? Do they have an OAI target? Maybe we could even takes dumps of localized wikipedias?

Cheers,
Mikkel

PS: I ofcourse mean to strip all formatting from the harvested files.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]