[Tracker] Filter for chm indexing



I was looking for a filter for tracker to index chm files, but I
couldn't find one, so I wrote one myself... It depends one the python
chm library for extraction http://gnochm.sourceforge.net/pychm.html
and html2text http://www.mbayer.de/html2text/ to convert the output to
text. Both can be found in most distro's. The filter should be named
x-chm_filter and should reside in something like
/usr/lib/tracker/filters/application/

I'm inlining the filter here:

-----------------------------------------------------------

#!/usr/bin/env python

import sys, os
from chm import chmlib

def get_entry(chmfd, ui, dummy):
   if ui.path.endswith(('html','htm')):
       (len, data) = chmlib.chm_retrieve_object(chmfd, ui, 0L, ui.length)
       if len > 0:
           fd.write(data)
   return chmlib.CHM_ENUMERATOR_CONTINUE

if len(sys.argv) > 1:
   chmfd = chmlib.chm_open(sys.argv[1])
   if chmfd != None:
       fd = os.popen("html2text -nobs", "w")
       chmlib.chm_enumerate(chmfd, chmlib.CHM_ENUMERATE_ALL, get_entry, None)
       chmlib.chm_close(chmfd)
       fd.close()



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]