[Tracker] Desktop Crawler's feature comparison



Hi,

I am trying to make an updated (and sometimes the documentation or
webpages are outdated) comparision table in terms of features for the
following desktop search tools: Beagle, Tracker, Recoll, Strigi and
Jindex. Which would then be added to wikipedia.

I would ask your help to tell me what features are implemented in your
tool (1) or are foreseen in the future... these are just a couple of
Yes or No questions, so it's brief.
(1) Note that I'm sending this email (using BCC) to all the
corresponding tool's mailing list or developers.

I think having this information would be good for users and developers
since there are already several desktop crawlers available.
It would be nice if your website maintainer added this information
(maybe in the form of a table) in your Features or FAQ section.
This list can also be seen as ideas for possible features to be added.

Thank you for your consideration.

PS: I'm aware that the data crawler uses different backends for the
different file types, in that case, please refer the backend when
appropriate. For example, "PDF indexing capabilities is limited by
xpdf. It does not recognize words with hyphes."

01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))
02) Boolean operators (+and -not)
03) Searching non-alphanumeric characters, maybe through the use of
backslash (e.g. := + ? { ] &)
04) Exact sentences using double quotes (support for line breaks?
hyphenization? text in columns?)
05) tex, pdf and ps (index sentences correctly even when text is
organized in columns or uses hyphens; this is common in scientific
articles using the pdf format)
06) Different encoding and languages (ascii, utf8, japanese, etc)
07) Index archive files (tar, bz2, rar, 7zp, etc) recursively
08) Index simultaneously with and without stemming (for example,
flooring, floors, floored would all be transformed to floor)
09) Use of tags to better organize data (allows the user to have collections)
10) Restrict search to specific directories or tags
11) Provide thumbnails for images and video (allow specifying number
of thumbnails for video and time interval between thumbs)
12) Image and video content search (something like imgseek... maybe
better or maybe it could use it as backend)
13) Index removable media (making possible to index and organize data
in dvds or external hard drives)
14) Databases supported
15) Allow having different databases catalogs (usefull for searching
collection of external devices)
16) Checksum (allows finding duplicate files)
17) Other aspects worthy of mention



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]