Re: [Tracker] [Announce] Tracker 0.5.1



Mark Florian wrote:
Well done on the release!

I downloaded and built the tarball on Ubuntu edgy, using the included
debian directory without any changes (except 'dch -i'ing to make the
package with the right version number), and then of course installed the
new libtrackerclient0, tracker, tracker-gnome-search-tool, tracker-utils
debs.

any help getting the debs to use the right version number would be appreciated - im at a lost ands thats why I did not ship debs last night


I removed the old ~/.Tracker directory and set the new tracker indexing,
with the --turbo switch. After that was all done, I ran a few test
searches, and unfortunately, there still seem to be problems. To be
sure, I've since rebooted, re-removed ~/.Tracker and set trackerd to
index everything again, but without --turbo. Same story.

Firstly, I have a (text) file called "Jesus May Ball photos" with
various words inside. I ran a search for eaden (it's a name inside in
file) and this returns no results. I then did a 'cp "Desktop/Jesus May
Ball photos" Desktop/place', and watched ~/.Tracker/tracker.log. The
result was this:

07 Nov 2006, 12:33:02:221 - File /home/markrian/Desktop/place has
finished changing
07 Nov 2006, 12:33:02:227 - saving basic metadata for *new*
file /home/markrian/Desktop/place with mime unknown and service type 8
07 Nov 2006, 12:33:07:717 - Total entities index : 2567
07 Nov 2006, 12:33:07:717 - Please wait while remaining data is flushed
to the inverted word index. This may take some time...
07 Nov 2006, 12:33:07:719 - flushing data (3 words left) - please wait
07 Nov 2006, 12:33:07:720 - flushing data (2 words left) - please wait
07 Nov 2006, 12:33:07:721 - flushing data (1 words left) - please wait
07 Nov 2006, 12:33:07:762 - flushing data (0 words left) - please wait
07 Nov 2006, 12:33:07:762 - All data has been flushed - waiting for new
file events...



service type 8 is "other files" and is caused by both xdgmime reporting unknown mime *and* reading the first 4kb returns invalid utf8

please check the file in nautilus (which also uses xdgmime) to see how it identifies it.

also try opening it in gedit - if its invalid utf8 then it wont be able to and therefore tracker was right to ignore it.

also rerun with --enable-debug to see more detailed info in log


It seems that the mime type is unknown. If I do the same operation but
append a .txt to the file name, the following happens:

07 Nov 2006, 12:53:32:504 - File /home/markrian/Desktop/place.txt has
finished changing
07 Nov 2006, 12:53:32:508 - saving basic metadata for *new*
file /home/markrian/Desktop/place.txt with mime text/plain and service
type 6
07 Nov 2006, 12:53:32:512 - Extracting Metadata for *new*
file /home/markrian/Desktop/place.txt with mime text/plain and service
type 6
07 Nov 2006, 12:53:37:906 - Total entities index : 2576
07 Nov 2006, 12:53:37:906 - Please wait while remaining data is flushed
to the inverted word index. This may take some time...
07 Nov 2006, 12:53:37:912 - flushing data (17 words left) - please wait
[...etc...]
07 Nov 2006, 12:53:37:920 - flushing data (0 words left) - please wait
07 Nov 2006, 12:53:37:920 - All data has been flushed - waiting for new
file events...

And searches for eaden return the result of places.txt, but nothing
else.

thats right because only recognised text files (or files than can be converted to text) have their contents indexed. If tracker thinks its not valid text then it wont get indexed (as is the case here)


If I search for 'jesus ball' then I get the result "Desktop/Jesus May
Ball photos" as expected.

The second problem is that I have a file called
bills_Maids_Causeway.ods. If I run a search for "bills maids causeway"
no results are returned. If I search for "bills_maids_causeway" then I
get that one result. The same effect can be seen with files
named-with-dashes-like-this.txt.

thats deliberate - we do not treat underscores or hyphens as word breaks so they are effectively one word

(this is important for searching source code)

if there are good reasons for also breaking them up then please let me know (does beagle do this?)



The third issue involves a test file I created, called 'whisper'. The
file, which I created in gedit, contains only the line:

I hear the sound of ticking clocks

Tracker's log picked it up, and registered the file with the correct
mimetype, text/plain. When I run a search for "ticking here", the file
whisper appears in the results. Is this expected behaviour? I would have
thought that search implied ticking AND here, not ticking OR here.

"here" is a stopword and is ignored in a search - check the log file for the exact search terms that were used




The fourth and final issue involves the other test file I created,
own_way.txt, containing the line:

To the end of the last page.

Searching for 'last' returns no results, and the log contains the
following interesting entries for this:

07 Nov 2006, 13:09:34:256 - Executing search with params Files, last
07 Nov 2006, 13:09:34:257 - tracker_indexer_get_hits: assertion
`(indexer && words && words[0] && (limit > 0))' failed
07 Nov 2006, 13:09:34:257 - search returned no results

"last" is a stopword. see /usr/share/data/languages/stopwords.en for full list.

(if you have set the language code to anything other than "en" then see appropriate file)

We should include code to make sure that assertion does not pop up in those cases.


I hope this information helps. If anyone would like more details/tests,
just say. Again, well done to all involved with this release! Tracker is
*totally* awesome.

only the first issue needs investigation really. ANy info from --enable-debug would be helpful


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]