[Tracker] Filter for chm indexing
- From: Raaf <monraaf gmail com>
- To: tracker-list gnome org
- Subject: [Tracker] Filter for chm indexing
- Date: Tue, 12 May 2009 05:28:27 +0200
I was looking for a filter for tracker to index chm files, but I
couldn't find one, so I wrote one myself... It depends one the python
chm library for extraction http://gnochm.sourceforge.net/pychm.html
and html2text http://www.mbayer.de/html2text/ to convert the output to
text. Both can be found in most distro's. The filter should be named
x-chm_filter and should reside in something like
/usr/lib/tracker/filters/application/
I'm inlining the filter here:
-----------------------------------------------------------
#!/usr/bin/env python
import sys, os
from chm import chmlib
def get_entry(chmfd, ui, dummy):
if ui.path.endswith(('html','htm')):
(len, data) = chmlib.chm_retrieve_object(chmfd, ui, 0L, ui.length)
if len > 0:
fd.write(data)
return chmlib.CHM_ENUMERATOR_CONTINUE
if len(sys.argv) > 1:
chmfd = chmlib.chm_open(sys.argv[1])
if chmfd != None:
fd = os.popen("html2text -nobs", "w")
chmlib.chm_enumerate(chmfd, chmlib.CHM_ENUMERATE_ALL, get_entry, None)
chmlib.chm_close(chmfd)
fd.close()
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]