Re: [Tracker] Desktop Crawler's feature comparison
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: John Smith <the real monkey d luffy gmail com>
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] Desktop Crawler's feature comparison
- Date: Thu, 06 Mar 2008 10:55:55 -0500
On Thu, 2008-03-06 at 01:44 +0000, John Smith wrote:
Hi,
I am trying to make an updated (and sometimes the documentation or
webpages are outdated) comparision table in terms of features for the
following desktop search tools: Beagle, Tracker, Recoll, Strigi and
Jindex. Which would then be added to wikipedia.
I would ask your help to tell me what features are implemented in your
tool (1) or are foreseen in the future... these are just a couple of
Yes or No questions, so it's brief.
(1) Note that I'm sending this email (using BCC) to all the
corresponding tool's mailing list or developers.
I think having this information would be good for users and developers
since there are already several desktop crawlers available.
It would be nice if your website maintainer added this information
(maybe in the form of a table) in your Features or FAQ section.
This list can also be seen as ideas for possible features to be added.
Thank you for your consideration.
PS: I'm aware that the data crawler uses different backends for the
different file types, in that case, please refer the backend when
appropriate. For example, "PDF indexing capabilities is limited by
xpdf. It does not recognize words with hyphes."
01) Regular expressions (e.g.: com*on st?ff [A-F] (this | that))
via our rdf query - yes
also future xesam implementation will do this too
02) Boolean operators (+and -not)
they are anded by default - we have an expression tree patch to do other
booleans (I have asked for this patch to applied to trunk so the answer
is effectively yes)
03) Searching non-alphanumeric characters, maybe through the use of
backslash (e.g. := + ? { ] &)
nope - no point searching them (we always filter them out)
04) Exact sentences using double quotes (support for line breaks?
hyphenization? text in columns?)
Exact and precise phrases will be supported shortly (it will be
case-insensitive but otherwise precise including non-alphanumerics)
05) tex, pdf and ps (index sentences correctly even when text is
organized in columns or uses hyphens; this is common in scientific
articles using the pdf format)
the extractor removes tables fro pdf so they should be correct
06) Different encoding and languages (ascii, utf8, japanese, etc)
everything is converted to utf-8. non-utf8 needs user locales set up
appropriately so that data can be successfully converted to utf-8
07) Index archive files (tar, bz2, rar, 7zp, etc) recursively
nope but will probably do so soon
08) Index simultaneously with and without stemming (for example,
flooring, floors, floored would all be transformed to floor)
yes
09) Use of tags to better organize data (allows the user to have collections)
yes
10) Restrict search to specific directories or tags
yes
11) Provide thumbnails for images and video (allow specifying number
of thumbnails for video and time interval between thumbs)
yes
12) Image and video content search (something like imgseek... maybe
better or maybe it could use it as backend)
dunno what you mean? tags and metadata is extracted from them
13) Index removable media (making possible to index and organize data
in dvds or external hard drives)
we have partial patch for this but needs more work
14) Databases supported
sqlite
15) Allow having different databases catalogs (usefull for searching
collection of external devices)
yes
16) Checksum (allows finding duplicate files)
we have db support for this but has not been fully implemented as its
not been necessary
17) Other aspects worthy of mention
metadata store - can store user objects and index their metadata by
using tracker as the primary storage
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]