Re: [Tracker] [WIP] Application support: man pages, Tomboy, & Liferea
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: Edward Duffy <eduffy gmail com>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] [WIP] Application support: man pages, Tomboy, & Liferea
- Date: Thu, 07 Dec 2006 21:37:40 +0000
Edward Duffy wrote:
This isn't quite worked out, but I want to throw this out to the group
and get some preliminary feedback. Attached is a patch that allows us
to index system-wide and user installed man pages, Tomboy notes, and
some basic Liferea support. The external services all use the
out-of-process mechanism used by the text filter and embded metadata
extractor. However, there are more operations, and therefore, more
applications for each service.
it would be best to discuss this first before doing the patch (unless
you are content to modify it quite a bit - which is fine!)
I like this in general but there are a few things:
I want it to work with third party packages so it needs to have easy
installation and deinstallation
First, the directory structure:
in tracker/src there now resides an "external-services" directory.
In this directory you will find one directory for each service. The
service directories are named after their configuration key in
~/.Tracker/tracker.cfg. This makes it easy to add new services with
out recompiling trackerd (and hopefully encourage other developers to
provide tracker support with their apps!). For example, you'll find
the directory tracker/src/external-services/IndexManPages and a
IndexManPages key under the Services group in tracker.cfg. Each
service has five programs:
I would prefer just "services" to "external-services"
1) check-deps
This program is called in the very begining, if the user actives the
service's key. This program may check for any other required programs
that is needed for this service to work. For example, I check for
xsltproc and w3m for the Liferea indexer. If non-zero is returned,
the indexer is disabled.
Im not sure this is needed but I suppose there's no harm in having it
but it should be optional
2) watch-list
This program returns a list of directories to be added to trackerd's
watch list. You must list each directory, it will not automatically
recurse all subdirectorys. If you need all subdirs, I recommend find:
# find $basedir -type d
See IndexManPage/watch-list for an example.
not needed - I prefer to have a service file (like the dbus service
files or .Desktop files which specify the options needed here).
All we need is a directory like /usr/share/tracker/services to hold the
service files. This makes it easy for seperate packages to install and
de-install stuff without any hassle.
At start up trackerd can simply read all these service files (+ also
watch for new ones too!)
3) service-type
This progam returns the service type of a file being watched by this
service.
argv[1] == the full path to the file being watched
argv[2] == the mime type of the file
I provides the file path and mime, if you need it, but I imagine this
should be constant
not needed - any file in the watch directory above would be passed to
the spawned service-handler (we can include globs in the service file to
filter certain files to pass)
generally these watched folders will all be in hidden folders (usually)
so they wont conflict with the file indexer.
4) filter-text
This works very similar to the text filters you find in the
tracker/filters directory, except
argv[1] == the full path
argv[2] == the mime type of the file !!
argv[3] == the path to the filtered text !!
5) extract-metadata
Again, behaves like tracker-extract. It takes a file and splits out
Key=Value;\n pairs for each piece of metadata
argv[1] == the full path
argv[2] == the mime type of the file
I was planning on migrating the existing metadata extractors format to
an xml format (our current one is quite hacky!). We also need to handle
multiple values for the same metadata type.
something like:
<extraction>
<metadata name="Audio.Title">Moonlight Sonata</metadata>
<metadata name="Audio.Artist">Beethoven</metadata>
</extraction>
Feel free to modify code to match above.
the filter program and metadata extractor program should be specified in
the service file so there's no need to worry about mimes.
We need a function in tracker-utils that determines if a file is
associated with a particular service by looking at its path and matching
it against any path thats registered as a watch by a service. We need
this for the emails so may as well reuse it for all services. (just
needs to call g_str_has_prefix on it)
So, like I said before, I'm including 3 implementations of this:
1) IndexManPages
The new service type is "Man Pages" and it adds a new "Man" metadata
class. The class can tag a man page's title, section, date it was
written, source (app + version), and manual name (eg, Debian Project
for debian specific man pages). It also provides a full text indexer.
Only thing lacking here is the language the man page was written in.
Currently, I reject any non-english directory. It's easy to index
them all, but it's just faster for me if trackerd just ignores those.
ok great. Maybe we can use user's locale to work out which translations
to index?
2) IndexTomboy
This uses the Notes service type, and adds a Title field to the Note
metadata class. There's obviously more I could grab from the tomboy
files, I just haven't gotten around to it yet. Full text is
supported.
see Mikkel's tomboy indexer which he sent on this list last month - it
does all the fields I believe. Perhaps you could use some of his code?
3) IndexLiferea
This adds a service type called "Web Channels" and a metadata class
"RSS". This indexer sucks and I need some help on it. :(
Currently, you only get one entry in the database for each feed. So
all the text in the feed is associated with the entire feed, instead
of an individual item. For example, if I was to search for "tracker"
I'd expect a link to a specific post by Jamie, instead I get a link
planet gnome. I'm not even sure what I need here, I'd like some way
to associate a file with multiple database items. Is this possible?
not sure - will have to think. The xml above for the extractor could be
modified to support multiple sub-entities with their own uri in one go.
<extraction>
<Entity uri="/home/jamie/music/moonlight.ogg">
<metadata name="Audio.Title">Moonlight Sonata</metadata>
<metadata name="Audio.Artist">Beethoven</metadata>
</entity>
<Entity uri="/home/jamie/music/moonlight.ogg">
<metadata name="Audio.Title">Moonlight Sonata</metadata>
<metadata name="Audio.Artist">Beethoven</metadata>
</entity>
</extraction>
so in the DB, they should be separate objects "RSS" feed and "RSS Item"
You could also build the uri to include the rss file and an offset to
the item that matches and the gui can then decode it and show a viewer
for it
I'm pretty happy with the man pages indexer, I may look into having
Yelp use some time in the future. But I'm not calling dibs, so anyone
else looking for an project to work on is more than welcome.
The tomboy indexer works as expected also. I belive Tomboy is
dbus-ified, so if any one wants to update tracker-search-tool to
search Notes also and fire up with Tomboy when you click on a note,
that'd be awsome.
the service file can contain this - either an exec name or a dbus
interface/object name
Sample service file might look like:
[Service]
Type=Notes
WatchDirs=$HOME/.tomboy
WatchRecursive=false
WatchFilter=
[Metadata]
Exec=/usr/bin/tomboy-extractor
[TextFilter]
Exec=
[Display]
Exec=/usr/bin/tomboy
Any comments?
--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]