Desktop Crawler's feature comparison



Hi,

 I am trying to make an updated (and sometimes the documentation or
 webpages are outdated) comparision table in terms of features for the
 following desktop search tools: Beagle, Tracker, Recoll, Strigi and
 Jindex. Which would then be added to wikipedia.

 I would ask your help to tell me what features are implemented in your
 tool (1) or are foreseen in the future... these are just a couple of
 Yes or No questions, so it's brief.
 (1) Note that I'm sending this email (using BCC) to all the
 corresponding tool's mailing list or developers.

 I think having this information would be good for users and developers
 since there are already several desktop crawlers available.
 It would be nice if your website maintainer added this information
 (maybe in the form of a table) in your Features or FAQ section.
 This list can also be seen as ideas for possible features to be added.

 Thank you for your consideration.

 PS: I'm aware that the data crawler uses different backends for the
 different file types, in that case, please refer the backend when
 appropriate. For example, "PDF indexing capabilities is limited by
 xpdf. It does not recognize words with hyphes."

 01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))
 02) Boolean operators (+and -not)
 03) Searching non-alphanumeric characters, maybe through the use of
 backslash (e.g. := + ? { ] &)
 04) Exact sentences using double quotes (support for line breaks?
 hyphenization? text in columns?)
 05) tex, pdf and ps (index sentences correctly even when text is
 organized in columns or uses hyphens; this is common in scientific
 articles using the pdf format)
 06) Different encoding and languages (ascii, utf8, japanese, etc)
 07) Index archive files (tar, bz2, rar, 7zp, etc) recursively
 08) Index simultaneously with and without stemming (for example,
 flooring, floors, floored would all be transformed to floor)
 09) Use of tags to better organize data (allows the user to have collections)
 10) Restrict search to specific directories or tags
 11) Provide thumbnails for images and video (allow specifying number
 of thumbnails for video and time interval between thumbs)
 12) Image and video content search (something like imgseek... maybe
 better or maybe it could use it as backend)
 13) Index removable media (making possible to index and organize data
 in dvds or external hard drives)
 14) Databases supported
 15) Allow having different databases catalogs (usefull for searching
 collection of external devices)
 16) Checksum (allows finding duplicate files)
 17) Other aspects worthy of mention


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]