Re: [Tracker] Desktop Crawler's feature comparison




On Thu, 2008-03-06 at 01:44 +0000, John Smith wrote:
Hi,

 I am trying to make an updated (and sometimes the documentation or
 webpages are outdated) comparision table in terms of features for the
 following desktop search tools: Beagle, Tracker, Recoll, Strigi and
 Jindex. Which would then be added to wikipedia.

 I would ask your help to tell me what features are implemented in your
 tool (1) or are foreseen in the future... these are just a couple of
 Yes or No questions, so it's brief.
 (1) Note that I'm sending this email (using BCC) to all the
 corresponding tool's mailing list or developers.

 I think having this information would be good for users and developers
 since there are already several desktop crawlers available.
 It would be nice if your website maintainer added this information
 (maybe in the form of a table) in your Features or FAQ section.
 This list can also be seen as ideas for possible features to be added.

 Thank you for your consideration.

 PS: I'm aware that the data crawler uses different backends for the
 different file types, in that case, please refer the backend when
 appropriate. For example, "PDF indexing capabilities is limited by
 xpdf. It does not recognize words with hyphes."

 01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))

via our rdf query - yes
also future xesam implementation will do this too

 02) Boolean operators (+and -not)

they are anded by default - we have an expression tree patch to do other
booleans (I have asked for this patch to applied to trunk so the answer
is effectively yes)


 03) Searching non-alphanumeric characters, maybe through the use of
 backslash (e.g. := + ? { ] &)

nope - no point searching them (we always filter them out)


 04) Exact sentences using double quotes (support for line breaks?
 hyphenization? text in columns?)

Exact and precise phrases will be supported shortly (it will be
case-insensitive but otherwise precise including non-alphanumerics)


 05) tex, pdf and ps (index sentences correctly even when text is
 organized in columns or uses hyphens; this is common in scientific
 articles using the pdf format)

the extractor removes tables fro pdf so they should be correct

 06) Different encoding and languages (ascii, utf8, japanese, etc)

everything is converted to utf-8. non-utf8 needs user locales set up
appropriately so that data can be successfully converted to utf-8 

 07) Index archive files (tar, bz2, rar, 7zp, etc) recursively

nope but will probably do so soon

 08) Index simultaneously with and without stemming (for example,
 flooring, floors, floored would all be transformed to floor)

yes

 09) Use of tags to better organize data (allows the user to have collections)

yes

 10) Restrict search to specific directories or tags

yes

 11) Provide thumbnails for images and video (allow specifying number
 of thumbnails for video and time interval between thumbs)

yes

 12) Image and video content search (something like imgseek... maybe
 better or maybe it could use it as backend)

dunno what you mean? tags and metadata is extracted from them

 13) Index removable media (making possible to index and organize data
 in dvds or external hard drives)


we have partial patch for this but needs more work 

 14) Databases supported

sqlite

 15) Allow having different databases catalogs (usefull for searching
 collection of external devices)

yes

 16) Checksum (allows finding duplicate files)

we have db support for this but has not been fully implemented as its
not been necessary

 17) Other aspects worthy of mention

metadata store - can store user objects and index their metadata by
using tracker as the primary storage






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]