beagle r3290 - in trunk/beagle: Filters Util beagled glue tools



Author: joeshaw
Date: 2007-01-20 00:18:26 +0000 (Sat, 20 Jan 2007)
New Revision: 3290
ViewCVS link: http://svn.gnome.org/viewcvs/beagle?rev=3290&view=rev

Added:
   trunk/beagle/glue/rlimit-glue.c
   trunk/beagle/tools/DocExtractor.cs
Modified:
   trunk/beagle/Filters/FilterDOC.cs
   trunk/beagle/Filters/FilterExternal.cs
   trunk/beagle/Filters/FilterMPlayerVideo.cs
   trunk/beagle/Filters/FilterPdf.cs
   trunk/beagle/Filters/FilterRPM.cs
   trunk/beagle/Filters/FilterSpreadsheet.cs
   trunk/beagle/Filters/FilterTotem.cs
   trunk/beagle/Util/SafeProcess.cs
   trunk/beagle/Util/SystemPriorities.cs
   trunk/beagle/beagled/beagled-index-helper.in
   trunk/beagle/beagled/beagled.in
   trunk/beagle/beagled/wrapper.in
   trunk/beagle/glue/Makefile.am
   trunk/beagle/tools/
   trunk/beagle/tools/Makefile.am
   trunk/beagle/tools/wrapper.in
Log:
Move MS Word text extraction out-of-process.  libwv1 is far too
unreliable to have it even inside our index helper process.  Crashes can
corrupt the index and some documents cause it to explode with memory.

* Add a new beagle-doc-extractor tool, which extracts the text from
a Word document and prints it out to stdout.  Ported FilterDOC to
use it.

* Added the ability to run child setup functions with SafeProcess, since
this is a nice feature with g_spawn().  These functions run in the child
process following the fork() but before the exec().  This allows us to
do things like setrlimit(2) on child processes.

* Added glue around setrlimit(2) so we can keep out of process helpers
under a certain amount of CPU and addressable space, killing them if
they exceed it.

* Added CPU time limits to all the filters which call out of process
helpers.  The numbers are best guesses from various files I have on
my system, but will inevitably need some tweaking.

* Added a memory limit to the Word extractor, since I have a file
which causes libwv1 to eat over a gig of memory in about 3 seconds.

* Various tweaks to scripts so that beagle-doc-extractor works without
running "make install"







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]