Re: [Tracker] Running tracker on an Ubuntu server box?

From: Martyn Russell <martyn imendio com>
To: tigerf web de
Cc: tracker-list gnome org
Subject: Re: [Tracker] Running tracker on an Ubuntu server box?
Date: Tue, 21 Apr 2009 09:53:15 +0100

tigerf wrote:

Hello from Germany,


Hi there :)

I'm looking for a way how to use tracker on an Ubuntu 8.10 server
edition (!) to index huge numbers (60.000+) of .doc and .pdf-files.

I should mention, that for Tracker 0.6.x, there is a hard limitation onthe amount of data you can store for full text searching. We currentlyuse QDBM for the index and it has a 2Gb file size limit.

This means that once you get to a sizeable index, it will not index anyfurther. This has recently lead to this "Can not index word" error wehave been seeing in bug reports.

In the 0.7 branch which we are working on in parallel, we are usingSQLite instead of QDBM. This should extend the possibilities here not tomention add partial match searching (i.e. "foo*" finds "foobar") whichis another feature missing.

Unfortunately, I couldn't say if 60k files would actually be reachingthe limit or not because it really does depend on how many words are inthose files. My estimate is that you wouldn't be far off the limit withthat many files though. Perhaps others which have had this QDBM errorcan comment on how many files they have to give some rough estimation here.

Is there somewhere a how-to or is the whole idea simply unrealistic?

I don't think so, Tracker just might not be able to cope with thevolumes for now. Of course, the said limit I mentioned above is peruser, if you are doing this on a multiple user level, things gettrickier but the QDBM limit is less of a problem.

Background:
I'm currently setting up a LAMP + Samba server to replace a windows box,
which offers thousands of documents to its windows clients via network
shares. I'm using Apache & PHP as a frontend, which translates the
user's web-page input into a commandline using "find". The results are
parsed and returned to the users via HTML pages containing links to the
matching files found.

I expect the combination of tracker as a backend and a commandline
querying tool could be much faster and flexible than the current approach.

Tracker has a daemon monitoring changes in the filesystem and updating
its index accordingly, I guess. Is there an easier way to access
tracker's index via PHP? Some kind of SQL interface may be?

My requirements are basically the same an ordinary NAS has, one can
imagine my Ubuntu box as a NAS box, which offers only an HTML user
interface.

Sorry for my may be nooby questions, but I my linux knowledge is rather
limited at the moment and I didn't find a better place to ask.


No need to apologise, thanks for asking!

--
Regards,
Martyn

Follow-Ups:
- Re: [Tracker] Running tracker on an Ubuntu server box?
  - From: tigerf

References:
- [Tracker] Running tracker on an Ubuntu server box?
  - From: tigerf

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]