Re: [Tracker] Indexers comparison



Mikkel Kamstrup Erlandsen wrote:
    Unfortunatelly this is rather hard to do for me, because in the data set
    there were some documents that might be for internal use only, so it
    would be very time consuming to select the "proper" ones :-(

    But maybe it is time to create one good set of documents that people can
    freely use for testing the indexers.


Maybe some wikipedia dumps? Do they have an OAI target? Maybe we could
even takes dumps of localized wikipedias?

Cheers,
Mikkel

PS: I ofcourse mean to strip all formatting from the harvested files.

Hello,
I've created small java application and posted it on my rarely updated
blog, which grabs some text from wikipedia (MediaWiki) as you wished,
please test it.

http://blogs.sun.com/migi/entry/wikipedia_for_indexers_testing

hope it will help with testing

-- 
best
Michal Pryc




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]