Re: [Tracker] Tracker used on a webserver to index uploaded documents



2007/4/25, Raphaël Slinckx <rslinckx gmail com>:
Hi !

(I'm not subscribed, so if you could please keep me CC-ed)

I was wondering if it makes sense to use tracker on a webserver to index documents (pdf, ppt, doc, images,..) uploaded by users. The web-app could then have a search box and query tracker to return the results in the webpage.
This raises some questions:
* Can we use dbus in conjunction with a web-app. I'm using the turbogears framework in python, i guess i can use dbus without glib mainloop if i don't need to listen to signals

Sync queries are plenty fast if Tracker is warm.

* Do the results get back quickly, since i guess i'll have to make sync-calls ?

Generally yes. I don't know how this would fare in a separate thread. It might be easy since there's no glib involved... I experienced some slow queries when requesting large result sets, but reading the query results in smaller chunks (maybe 20) is pretty fast.

* Is this just a crazy idea, and i should instead use the extraction libraries directly, and..

By extraction libraries you mean the ones provided by tracker i take it. How would you  use them directly? If you mean just "grep-like-behavior" I think you are in for a bumpy ride...

* What would i gain by using tracker instead of the extraction libraries directly (beside the advantage of having a ready-made solution)

Dunno.

* What about security ? feeding tracker with more or less random queries from the web could be dangerous ?

I think it would be Trackers responsibility to ensure that you are immune to sql-injection., but I don't know how much work has been put into this.

Any other comment is welcome!

A totally different alternative is to use Nutch (http://lucene.apache.org/nutch/). It is written in Java, based on Lucene, and is specifically designed to be a web search engine. They just released 0.9 and have a good reputation. Also Nutch is founded by the grand farther of free search engines Doug Cutting, which makes it pretty hard to beat regarding street cred :-)


Cheers,
Mikkel


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]