This isn't quite worked out, but I want to throw this out to the group and get some preliminary feedback. Attached is a patch that allows us to index system-wide and user installed man pages, Tomboy notes, and some basic Liferea support. The external services all use the out-of-process mechanism used by the text filter and embded metadata extractor. However, there are more operations, and therefore, more applications for each service. First, the directory structure: in tracker/src there now resides an "external-services" directory. In this directory you will find one directory for each service. The service directories are named after their configuration key in ~/.Tracker/tracker.cfg. This makes it easy to add new services with out recompiling trackerd (and hopefully encourage other developers to provide tracker support with their apps!). For example, you'll find the directory tracker/src/external-services/IndexManPages and a IndexManPages key under the Services group in tracker.cfg. Each service has five programs: 1) check-deps This program is called in the very begining, if the user actives the service's key. This program may check for any other required programs that is needed for this service to work. For example, I check for xsltproc and w3m for the Liferea indexer. If non-zero is returned, the indexer is disabled. 2) watch-list This program returns a list of directories to be added to trackerd's watch list. You must list each directory, it will not automatically recurse all subdirectorys. If you need all subdirs, I recommend find: # find $basedir -type d See IndexManPage/watch-list for an example. 3) service-type This progam returns the service type of a file being watched by this service. argv[1] == the full path to the file being watched argv[2] == the mime type of the file I provides the file path and mime, if you need it, but I imagine this should be constant 4) filter-text This works very similar to the text filters you find in the tracker/filters directory, except argv[1] == the full path argv[2] == the mime type of the file !! argv[3] == the path to the filtered text !! 5) extract-metadata Again, behaves like tracker-extract. It takes a file and splits out Key=Value;\n pairs for each piece of metadata argv[1] == the full path argv[2] == the mime type of the file So, like I said before, I'm including 3 implementations of this: 1) IndexManPages The new service type is "Man Pages" and it adds a new "Man" metadata class. The class can tag a man page's title, section, date it was written, source (app + version), and manual name (eg, Debian Project for debian specific man pages). It also provides a full text indexer. Only thing lacking here is the language the man page was written in. Currently, I reject any non-english directory. It's easy to index them all, but it's just faster for me if trackerd just ignores those. 2) IndexTomboy This uses the Notes service type, and adds a Title field to the Note metadata class. There's obviously more I could grab from the tomboy files, I just haven't gotten around to it yet. Full text is supported. 3) IndexLiferea This adds a service type called "Web Channels" and a metadata class "RSS". This indexer sucks and I need some help on it. :( Currently, you only get one entry in the database for each feed. So all the text in the feed is associated with the entire feed, instead of an individual item. For example, if I was to search for "tracker" I'd expect a link to a specific post by Jamie, instead I get a link planet gnome. I'm not even sure what I need here, I'd like some way to associate a file with multiple database items. Is this possible? I'm pretty happy with the man pages indexer, I may look into having Yelp use some time in the future. But I'm not calling dibs, so anyone else looking for an project to work on is more than welcome. The tomboy indexer works as expected also. I belive Tomboy is dbus-ified, so if any one wants to update tracker-search-tool to search Notes also and fire up with Tomboy when you click on a note, that'd be awsome. The included patch also updates tracker-search and libtracker, so you can search for the "Man Pages" and "Notes" service types.
Attachment:
tracker-external-services.patch
Description: Text Data