Re: [Tracker] Tracker used on a webserver to index uploaded documents

From: "Mikkel Kamstrup Erlandsen" <mikkel kamstrup gmail com>
To: raphael slinckx net
Cc: tracker-list gnome org
Subject: Re: [Tracker] Tracker used on a webserver to index uploaded documents
Date: Wed, 25 Apr 2007 21:13:32 +0200

2007/4/25, Raphaël Slinckx <rslinckx gmail com>:

Hi !

(I'm not subscribed, so if you could please keep me CC-ed)

I was wondering if it makes sense to use tracker on a webserver to index documents (pdf, ppt, doc, images,..) uploaded by users. The web-app could then have a search box and query tracker to return the results in the webpage.
This raises some questions:
* Can we use dbus in conjunction with a web-app. I'm using the turbogears framework in python, i guess i can use dbus without glib mainloop if i don't need to listen to signals

Sync queries are plenty fast if Tracker is warm.

* Do the results get back quickly, since i guess i'll have to make sync-calls ?

Generally yes. I don't know how this would fare in a separate thread. It might be easy since there's no glib involved... I experienced some slow queries when requesting large result sets, but reading the query results in smaller chunks (maybe 20) is pretty fast.

* Is this just a crazy idea, and i should instead use the extraction libraries directly, and..

By extraction libraries you mean the ones provided by tracker i take it. How would you use them directly? If you mean just "grep-like-behavior" I think you are in for a bumpy ride...

* What would i gain by using tracker instead of the extraction libraries directly (beside the advantage of having a ready-made solution)

Dunno.

* What about security ? feeding tracker with more or less random queries from the web could be dangerous ?

I think it would be Trackers responsibility to ensure that you are immune to sql-injection., but I don't know how much work has been put into this.

Any other comment is welcome!

A totally different alternative is to use Nutch (http://lucene.apache.org/nutch/). It is written in Java, based on Lucene, and is specifically designed to be a web search engine. They just released 0.9 and have a good reputation. Also Nutch is founded by the grand farther of free search engines Doug Cutting, which makes it pretty hard to beat regarding street cred :-)

Cheers,
Mikkel

References:
- [Tracker] Tracker used on a webserver to index uploaded documents
  - From: =?UTF-8?Q?Rapha=C3=ABl_Slinckx?=

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]