Re: [Fwd: Re: GNOME and advanced search indexes viability]





Good point and if your certain that is what you need, then it certainly
rules out medusa.  As your interested in writing a service, then writing
a dual licensed service that uses the GPLed medusa as a
datasource/backend is feasible.

That is also true, and it solves the thing. Perhaps I can use medusa, and avoid writing my own index. I also investigated www.xapian.org. This seems much better than medusa but it's kinda lower level

I it's written in C, making development slow and making it hard to get people around here to work on it

A matter of perspective.

Definitely. But the potential slaves I could get are more into PHP and Visual Basic. Which makes C hard for them. Python is perhaps one of my strongest alternatives, because it's scripted, and it produces more work for each line than C.

* the implementation is per-user, instead of being per-system. that means several medusa indexers and several indexes, instead of one master index.

There is nothing stopping anyone from setting up a master index.  I do
as a matter of fact using symbolic links.  I've put thought to a more
practical implementation of users and system indexes and sharing them. I don't think there will be a real solution to this, or even your own
system, until a content search tool has a user base large enough for the
community to really get what it offers.
I'm planning to integrate our work into the GNOME and KDE search tools - integration is key. I presuppose integration should be extremely easy, because our Search service would communicate with search tools using an XML vocabulary.

we don't want a hundred PCs indexing the NFS server each. we want the search service to delegate queries to NFS servers, so as to avoid network load and wasted disk space

Truth.  Either by mining the old service code from the attic, or by
creating a new service that you can dual license solve the problem.  I'd
be in favor of building a new service.  I agree with your approach--a
service is needed.

Trust me on building it. I'm rallying people ("potential slaves" hehe) here at my college to join me on this project.

* as there is no documentation, we don't know if Medusa can index gigabytes of files, extract their data and metadata, and provide less-than-10 second query response. Our initial prospect for the database, PostgreSQL, can indeed provide less-than-10 second response for queries, provided the proper indexes are applied to the proper tables.

I believe it can. My index of is made from 7 gigs (I have a 1.6ghz
Celeron).  this is the output a a query:

msearch -i Home -u 'gnome-search:[file:///]content includes_all_of
chovey medusa'
Took 0 seconds, and 3 milliseconds
Begin location for word chovey is 255360
End location for word chovey is 255902
Begin location for word medusa is 2055001
End location for word medusa is 2055363
Took 0 seconds, and 399 milliseconds

I'm sure PostgreSQL can exceed that.  Medusa uses db1, taking to the
21st century (db4) might make it faster.

fast. I'm wondering how Xapian would fare. Interesting thing that Xapian has probability indexing/searching. Xapian returns relevancy scores for each documents.

ut if you could help me work through these issues, we would be glad (after all, we'd be saving work) to do this. Trust me, what we want to do is much bigger than just medusa. We want to bring enterprise-class full text indexing and search to Linux, *and* open-source it. We also will be looking into data mining, to provide document maps and the like. This all when the basic technology is ready.

Indexers are easy to write

If your really committed, to your vision then go for it.  Building a
enterprise level application take a lot of time and labor, possibly more
than your estimating.  Medusa was funded by Eazil and had full-time
developers working on it for more than a year; it's not a small
undertaking.  Medusa is very powerful, poorly understood for what it is,
and has has been neglected even though it is very near usable.
I agree that free software needs a good content search service, and I
think you can get to your stated goal faster by championing a new vision
for medusa.  Just building a service that calls a medusa backend via
gnomevfs will get you something to measure where to go next.  If medusa
falls short, replace it. If it works, then enhance your service and
extend medusa.

Would you? This would be great. I'd advise you to take a look at Xapian and see how their ideas can be integrated. Oh, and Xapian is GPL =).

I haven't been neglecting medusa.  I was side tracked by a false
indexing problems for 3 months (in the end the hard-drive died).  Now
that I have a new laptop, I'm focusing on connecting nautilus to
medusa.  The way nautilus connects could be the same means your service
would connect.  Medusa works right now, but few people know it.  There
wont be a big demand for content search enough users experience it and
understand that it compliments object and navigation data storage (the
desktop).
What say you if we agreed on an XML vocabulary to communicate search tools with search services to avoid duplication of work?

The same goes for the enterprise world--I am employed by my company
(Time Life Inc. sadly) to know all thing about e-commerce, and anything
it might relate too.  I use medusa to get fast answers.  Getting the
other employees to search (or get the courage to read the help file) is
difficult because that have little experience with it, and even less
success.  When the users get experience, they will want a content search
service, and they will want a good one.

Yes. Knowledge mining is neglected, and most people don't think they need a KM tool until they start using one. That for one makes Medusa and similars a hard sell. =)




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]