Re: [Tracker] [Announce] Tracker 0.5.1



On Tue, 2006-11-07 at 13:38 +0000, Jamie McCracken wrote:
please check the file in nautilus (which also uses xdgmime) to see how 
it identifies it.

Nautilus identifies the file as mimetype text/plain.

also try opening it in gedit - if its invalid utf8 then it wont be able 
to and therefore tracker was right to ignore it.

Opening it in gedit poses no problems at all. In fact, I think it was
created in gedit...

also rerun with --enable-debug to see more detailed info in log

Done. The log says this after creating a copy (called 'other') of the
problematic file:

07 Nov 2006, 22:06:38:735 - File /home/markrian/Desktop/other has
finished changing
07 Nov 2006, 22:06:38:739 - file /home/markrian/Desktop/other is
indexable
07 Nov 2006, 22:06:38:739 - /home/markrian/Desktop/other is not a text
file
07 Nov 2006, 22:06:38:741 - saving basic metadata for *new*
file /home/markrian/Desktop/other with mime unknown and service t
ype 8
07 Nov 2006, 22:06:38:746 - file /home/markrian/Desktop/other is
indexable
07 Nov 2006, 22:06:38:750 - 0 files are pending with count 0
07 Nov 2006, 22:06:44:139 - Total entities index : 1379
07 Nov 2006, 22:06:44:139 - Please wait while remaining data is flushed
to the inverted word index. This may take some time...
07 Nov 2006, 22:06:44:141 - flushing data (2 words left) - please wait
07 Nov 2006, 22:06:44:142 - flushing data (1 words left) - please wait
07 Nov 2006, 22:06:44:143 - flushing data (0 words left) - please wait
07 Nov 2006, 22:06:44:143 - All data has been flushed - waiting for new
file events...

It seems odd that tracker thinks it's indexable, but isn't a text file,
and has an unknown mimetype, doesn't it?

Mark




It seems that the mime type is unknown. If I do the same operation but
append a .txt to the file name, the following happens:

07 Nov 2006, 12:53:32:504 - File /home/markrian/Desktop/place.txt has
finished changing
07 Nov 2006, 12:53:32:508 - saving basic metadata for *new*
file /home/markrian/Desktop/place.txt with mime text/plain and service
type 6
07 Nov 2006, 12:53:32:512 - Extracting Metadata for *new*
file /home/markrian/Desktop/place.txt with mime text/plain and service
type 6
07 Nov 2006, 12:53:37:906 - Total entities index : 2576
07 Nov 2006, 12:53:37:906 - Please wait while remaining data is flushed
to the inverted word index. This may take some time...
07 Nov 2006, 12:53:37:912 - flushing data (17 words left) - please wait
[...etc...]
07 Nov 2006, 12:53:37:920 - flushing data (0 words left) - please wait
07 Nov 2006, 12:53:37:920 - All data has been flushed - waiting for new
file events...

And searches for eaden return the result of places.txt, but nothing
else.

thats right because only recognised text files (or files than can be 
converted to text) have their contents indexed. If tracker thinks its 
not valid text then it wont get indexed (as is the case here)


If I search for 'jesus ball' then I get the result "Desktop/Jesus May
Ball photos" as expected.

The second problem is that I have a file called
bills_Maids_Causeway.ods. If I run a search for "bills maids causeway"
no results are returned. If I search for "bills_maids_causeway" then I
get that one result. The same effect can be seen with files
named-with-dashes-like-this.txt.

thats deliberate - we do not treat underscores or hyphens as word breaks 
so they are effectively one word

(this is important for searching source code)

if there are good reasons for also breaking them up then please let me 
know (does beagle do this?)



The third issue involves a test file I created, called 'whisper'. The
file, which I created in gedit, contains only the line:

I hear the sound of ticking clocks

Tracker's log picked it up, and registered the file with the correct
mimetype, text/plain. When I run a search for "ticking here", the file
whisper appears in the results. Is this expected behaviour? I would have
thought that search implied ticking AND here, not ticking OR here.

"here" is a stopword and is ignored in a search - check the log file for 
the exact search terms that were used




The fourth and final issue involves the other test file I created,
own_way.txt, containing the line:

To the end of the last page.

Searching for 'last' returns no results, and the log contains the
following interesting entries for this:

07 Nov 2006, 13:09:34:256 - Executing search with params Files, last
07 Nov 2006, 13:09:34:257 - tracker_indexer_get_hits: assertion
`(indexer && words && words[0] && (limit > 0))' failed
07 Nov 2006, 13:09:34:257 - search returned no results

"last" is a stopword. see /usr/share/data/languages/stopwords.en  for 
full list.

(if you have set the language code to anything other than "en" then see 
appropriate file)

We should include code to make sure that assertion does not pop up in 
those cases.


I hope this information helps. If anyone would like more details/tests,
just say. Again, well done to all involved with this release! Tracker is
*totally* awesome.

only the first issue needs investigation really. ANy info from 
--enable-debug would be helpful






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]