Beagle content extraction question
- From: Andrew Leung <aleung soe ucsc edu>
- To: dashboard-hackers gnome org
- Cc: Tim Bisson <Tim Bisson netapp com>
- Subject: Beagle content extraction question
- Date: Tue, 8 Jul 2008 17:47:08 -0700
Hi All,
I am using beagle's beagle-extract-content program to extract keywords
from files on my desktop for some later analysis. I've written a
highly parallelized file system crawler that can crawl the file
system's namespace very fast. I have modified it to use beagle-extract-
content on each file to extract the file's keywords. The program
(written in C) uses popen() to run beagle-extract-content and reads
the programs output from a socket and currently extracts contents
successfully.
However, it is slow. I noticed that beagle-extract-content will spend
~150ms opening the Filter and only 10ms or so actually crawling the
file and extracting keywords. It seems to be taking the time to
determine which filter to use on the file, even when I use the --
mimetype flag to tell it the type of the file. Is there anyway to
speed up this process or to tell it specifically which filter to use?
Alternatively, is there an API for the beagle-extract-content so that
I can simply invoke a function from C that doesn't need to spend as
much time determining which filter to use?
Thanks for you help in advance.
Andrew Leung
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]