Re: [Tracker] Running tracker on an Ubuntu server box?



tigerf wrote:
Hello from Germany,

Hi there :)

I'm looking for a way how to use tracker on an Ubuntu 8.10 server
edition (!) to index huge numbers (60.000+) of .doc and .pdf-files.

I should mention, that for Tracker 0.6.x, there is a hard limitation on the amount of data you can store for full text searching. We currently use QDBM for the index and it has a 2Gb file size limit.

This means that once you get to a sizeable index, it will not index any further. This has recently lead to this "Can not index word" error we have been seeing in bug reports.

In the 0.7 branch which we are working on in parallel, we are using SQLite instead of QDBM. This should extend the possibilities here not to mention add partial match searching (i.e. "foo*" finds "foobar") which is another feature missing.

Unfortunately, I couldn't say if 60k files would actually be reaching the limit or not because it really does depend on how many words are in those files. My estimate is that you wouldn't be far off the limit with that many files though. Perhaps others which have had this QDBM error can comment on how many files they have to give some rough estimation here.

Is there somewhere a how-to or is the whole idea simply unrealistic?

I don't think so, Tracker just might not be able to cope with the volumes for now. Of course, the said limit I mentioned above is per user, if you are doing this on a multiple user level, things get trickier but the QDBM limit is less of a problem.

Background:
I'm currently setting up a LAMP + Samba server to replace a windows box,
which offers thousands of documents to its windows clients via network
shares. I'm using Apache & PHP as a frontend, which translates the
user's web-page input into a commandline using "find". The results are
parsed and returned to the users via HTML pages containing links to the
matching files found.

I expect the combination of tracker as a backend and a commandline
querying tool could be much faster and flexible than the current approach.

Tracker has a daemon monitoring changes in the filesystem and updating
its index accordingly, I guess. Is there an easier way to access
tracker's index via PHP? Some kind of SQL interface may be?

My requirements are basically the same an ordinary NAS has, one can
imagine my Ubuntu box as a NAS box, which offers only an HTML user
interface.

Sorry for my may be nooby questions, but I my linux knowledge is rather
limited at the moment and I didn't find a better place to ask.

No need to apologise, thanks for asking!

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]