Indexer progress



This version is much improved.

It is much more robust - if external converters are not available,
it just ignores those file types. It also detects when AbiWord 
(which is the only persistent converter process) hangs or crashes, 
and kills and restarts it. 

Less external converters are needed, as there is now a managed 
Jpeg parser; it does comments, but not yet EXIF.  

There is support for running the indexer as a daemon process; when 
running in daemon mode it:
1) Detaches from the controlling terminal.
2) Cycles through the list of directories.
3) Acts as a good citizen and sleeps between file touches.
4) Runs an IPC server for external applications to push notifications 
or data into. 

There is now proper metadata support. Only two document types 
exploit it yet (PDF and HTML), but the metadata is normalised 
in the database. Adding support for document types is just a 
matter of extending the DocReader classes.

The backend does not yet use the metadata, as I couldn't see what 
clues could be converted into metadata searches, but I hope people 
can suggest what to do here.

The backend now has a very simple stemmer to improve recall on 
single keyword queries. 

The build system is still manual makefiles. The documentation has 
not been updated yet. Sorry.

If anybody would like to be sent the code (who I have not already pushed
it to) please mail me to ask, and I will send a tarball.

Julian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]