Re: [Tracker] It doesn't index PHP files



On Thu, 2012-10-04 at 12:34 +0100, Martyn Russell wrote:
On 04/10/12 12:24, Adam Tauno Williams wrote:
On Thu, 2012-10-04 at 10:54 +0100, Martyn Russell wrote:
On 04/10/12 09:17, Ivan Frade wrote:
   I think python script contents are indexed because the mimetype is
"text/x-python" and it falls back to the "text/*" extractor. PHP files
have the mimetype "application/x-php" and there is no default option
for that.
   This can be solved adding "application/x-php" in the .rules file of
the text extractor (check
/usr/local/share/tracker/extract-rules/90-text-generic.rule and other
rule files in the same folder).
   Note that generic text indexing means that the python code is treated
as plain text, a bunch of words. You could always write an specialized
extractor that takes into account the semantic of the file. For
example ignoring __init__.py files, or import statemens, maybe
ignoring the code and indexing only function names.... depends on what
you want. Same applies to PHP.
   Writing an extractor module is not difficult with some rudiments of
programming in C and we can help via mailing list or IRC. Patches are
welcome ;)
I should add, you can use:
    tracker-control -m $MIME
or
    tracker-control --reindex-mime-type=$MIME
If you change the rules file to note have to reindex all content again.

awilliam linux-nysu:~>
cat /usr/share/tracker/extract-rules/90-text-generic.rule
[ExtractorRule]
ModulePath=/usr/lib64/tracker-0.14/extract-modules/libextract-text.so
MimeTypes=text/*;application/php
awilliam linux-nysu:~> tracker-control
--reindex-mime-type="application/php"
Reindexing mime types was successful
awilliam linux-nysu:~> grep -c
Vaccaro /home/awilliam/Documents/Development/PHP/jsonRPCClient.php
What does this produce for you:
   $ /usr/libexec/tracker-extract -v 3 -f 
/home/awilliam/Documents/Development/PHP/jsonRPCClient.php
?
IT should display some or all of the text extracted AND the extractor used.

awilliam linux-nysu:~> /usr/lib/tracker-extract -v 3
-f /home/awilliam/Documents/Development/PHP/jsonRPCClient.php
Initializing tracker-extract...
Tracker-Message: Setting up monitor for changes to config
file:'/home/awilliam/.config/tracker/tracker-extract.cfg'
Locale 'TRACKER_LOCALE_LANGUAGE' was set to 'en_US.UTF-8'
Locale 'TRACKER_LOCALE_TIME' was set to 'en_US.UTF-8'
Locale 'TRACKER_LOCALE_COLLATE' was set to 'en_US.UTF-8'
Locale 'TRACKER_LOCALE_NUMERIC' was set to 'en_US.UTF-8'
Locale 'TRACKER_LOCALE_MONETARY' was set to 'en_US.UTF-8'
Initializing Storage...
Mount monitors set up for to watch for added, removed and
pre-unmounts...
No mounts found to iterate
Setting priority nice level to 19
Loading extractor rules... (/usr/share/tracker/extract-rules)
  Loaded rule '10-abw.rule'
  Loaded rule '10-epub.rule'
  Loaded rule '10-flac.rule'
  Loaded rule '10-gif.rule'
  Loaded rule '10-html.rule'
  Loaded rule '10-ico.rule'
  Loaded rule '10-jpeg.rule'
  Loaded rule '10-mp3.rule'
  Loaded rule '10-msoffice.rule'
  Loaded rule '10-oasis.rule'
  Loaded rule '10-pdf.rule'
  Loaded rule '10-png.rule'
  Loaded rule '10-ps.rule'
  Loaded rule '10-svg.rule'
  Loaded rule '10-tiff.rule'
  Loaded rule '10-vorbis.rule'
  Loaded rule '10-xmp.rule'
  Loaded rule '11-iso.rule'
  Loaded rule '11-msoffice-xml.rule'
  Loaded rule '15-gstreamer-guess.rule'
  Loaded rule '15-playlist.rule'
  Loaded rule '90-gstreamer-generic.rule'
  Loaded rule '90-text-generic.rule'
Extractor rules loaded
Setting memory limitations: total is 3.9 GB, minimum is 256 MB,
recommended is ~1 GB
  Virtual/Heap set to 2.0 GB (50% of total or MAXLONG)
Guessing mime type as '(null)'
tracker_mimetype_info_get_module: assertion `info != NULL' failed
No modules found to handle metadata extraction

Huh, so... it doesn't recognize the MIME type at all?





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]