Re: [Tracker] Tracker used on a webserver to index uploaded documents
- From: "Mikkel Kamstrup Erlandsen" <mikkel kamstrup gmail com>
- To: raphael slinckx net
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] Tracker used on a webserver to index uploaded documents
- Date: Wed, 25 Apr 2007 21:13:32 +0200
2007/4/25, Raphaël Slinckx <
rslinckx gmail com>:
Hi !
(I'm not subscribed, so if you could please keep me CC-ed)
I was wondering if it makes sense to use tracker on a webserver to index documents (pdf, ppt, doc, images,..) uploaded by users. The web-app could then have a search box and query tracker to return the results in the webpage.
This raises some questions:
* Can we use dbus in conjunction with a web-app. I'm using the turbogears framework in python, i guess i can use dbus without glib mainloop if i don't need to listen to signals
Sync queries are plenty fast if Tracker is warm.
* Do the results get back quickly, since i guess i'll have to make sync-calls ?
Generally yes. I don't know how this would fare in a separate thread. It might be easy since there's no glib involved... I experienced some slow queries when requesting large result sets, but reading the query results in smaller chunks (maybe 20) is pretty fast.
* Is this just a crazy idea, and i should instead use the extraction libraries directly, and..
By extraction libraries you mean the ones provided by tracker i take it. How would you use them directly? If you mean just "grep-like-behavior" I think you are in for a bumpy ride...
* What would i gain by using tracker instead of the extraction libraries directly (beside the advantage of having a ready-made solution)
Dunno.
* What about security ? feeding tracker with more or less random queries from the web could be dangerous ?
I think it would be Trackers responsibility to ensure that you are immune to sql-injection., but I don't know how much work has been put into this.
Any other comment is welcome!
A totally different alternative is to use Nutch (
http://lucene.apache.org/nutch/). It is written in Java, based on Lucene, and is specifically designed to be a web search engine. They just released
0.9 and have a good reputation. Also Nutch is founded by the grand farther of free search engines Doug Cutting, which makes it pretty hard to beat regarding street cred :-)
Cheers,
Mikkel
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]