Nautilus/Medusa search index enhancements



Medusa search/index improvement proposal
Hey there:

I had a couple ideas that kept me restless and so I'm writing this very late in 
the evening (or very early in the morning, choose the one you prefer - bett's 
number 2).

Goals
to leverage existing code for local area networks
to index removable devices and all kinds of volumes for creation of an offline 
searchable catalog
to improve overall user experience
Medusa is both a file indexing daemon and a file search service. So far so 
good. But Medusa can be extremely useful (a "slocate" in steroids) for many 
other things beyond the "personal computer" use case scenario.

Medusa can be leveraged to support two things I'd like to see: 

Removable media
Storage/Local area networks
Right now, Medusa has no (AFAIK) support for indexing removable volumes. It can 
index local volumes and removable volumes alright, but can't do so as I 
envisioned it could. Coupled with Nautilus, this could mean that searching for 
a file containing "user friendly", gives me results in a (e.g. multicolumn 
view) from where double clicking the file prompts me to insert the "Backups #2" 
volume on my CD-ROM drive Hitachi 48X plus. After which Nautilus could easily 
resume regular operations and show me the file's contents.

To add removable device support, three infrastructural areas need to be changed:

database changes
system interaction changes
user interface changes
database
The full path name to a file should no longer be stored. Now Medusa should 
store the volume's unique ID (GUID, whatever), and the volume name, along with 
the base path (say a file is /usr/lib/gkrellm/plugins/ and /usr is a 
filesystem, the base path here would be /lib/gkrellm/plugins).
I don't know if this is the way Medusa handles it right now, but, well, it 
seems good. This gives us room to include CD-ROMs and floppies in the mix. If 
you look carefully enough, perhaps storing the mount points or device files 
isn't even needed, because you store the type of media (hard disk, CD-ROM, 
etcetera, information readily available via standard system interfaces) along 
with the volume information, so when requiring a file, you can reconstruct the 
path independently from wherever it got indexed first (so you can pop your CD 
into the Hitachi CD-ROM whenever you are listening to music in your Plextor 
where you usually index your volumes). 
System interaction
Medusa should be signaled by autorun whenever a drive is inserted, or 
by 'dynamic' where a volume is hotplugged - the whole idea being that Medusa 
should detect mounts and act accordingly
Medusa should start indexing files if it's a new volume or begin monitoring 
files there for changes. 
Medusa should perhaps detect unmounts and immediately close all files being 
indexed, to avoid the "busy: cannot umount" problem when umounting drives. And 
it should be compatible with supermount. 
It should also monitor for changes in files so as to avoid rescanning the 
entire hard disk the braindead way slocate does nowadays (I think Medusa 
already acts smart in this particular issue). And it should be 'nice' to system 
resources (not be a hog when indexing).
User interface integration
Nautilus icons for drives should have a right click menu item that says "Index 
this volume now" to signal Medusa that it should begin indexing the removable 
volume. 
Search results would include files in removable volumes (or at least an option 
to include them!) and would index words in all files which have text. 
Activating an entry in the search dialog should prompt me for the volume if 
it's missing, and mount it, evidently only if this is possible at all.
As you can see, with the proper elbow grease at the nautilus level, and the 
proper steel framing at the system level, we now have a very powerful 
cataloguing system. As file systems evolve and delve into the metadata thing, 
the cataloguing system will get richer all by itself, with little future work. 
And I could look for all music files sung by ATB in all my CDs. Which would by 
far surpass anything the Microsofties offer nowadays. This all only using 
distro-provided Linux software. And at no user effort (except for, perhaps, 
floppies, since all other types of removable media are either handled by 
autorun or dynamic).

Network scenarios
Great huh? Now imagine this gets extended to support NFS networks. Medusa could 
be accessible via the network (a medusa search service in my local machine 
could, instead of indexing network mounts, delegate the search to the medusa 
search service at the machine where the network mount is exported), much the 
way SGI FAM does.

Medusa should also respect the /etc/exports conventions.

This way, we leverage the NFS networks' facilities, with zero extra 
configuration, while still providing an extremely low-resource network search 
facility. This could mean that a newbie corporate hire could look for every 
document with the word "policy" on it, with nearly zero network overhead, on 
every corporate file server, and have the network show him ONLY the files he 
can see (by traditional UNIX security semantics, which both Medusa and NFS 
respect - and slocate as well). And so our new hire can get to work quickly and 
know all company policies instead of getting a two-hundred page book or 
complicated instructions on how to "Connect to a network drive and access 
folder XYZ".

(*) FAM delegates file monitoring to remote FAMs. When FAM cannot connect to a 
remote NFS FAM server, it falls back to standard dnotify. This behavior can be 
mimicked in Medusa-searchd. To prevent failed file accesses, removable media 
search results wouldn't be returned to the client search service.

To work properly, this would evidently need autoconfiguration. This won't 
succeed if the Medusa daemon needs to be configured in client or server 
machines. Medusa has to work drop-in, out of the box, with current network 
configuration.

Medusa-searchd in the NFS server should accept remote connections if the NFS 
server is up
Medusa-searchd in the NFS server should respect the /etc/exports conventions 
and use the existing configuration files 
Medusa-indexd in the client should never index NFS mounted volumes. 
Medusa-searchd in the client should always attempt first to connect to the NFS 
server, and failing that, use standard search methods or not show any results 
from the server at all. Many NFS networks would come down to their knees at the 
sole idea of traffic from every client hammering the entire exported share to 
index it. 
Medusa-indexd/-searchd code should be audited for possible vulnerabilities 
involving feeding purposefully corrupted files or search queries
To boot, this could even be reused in a web project as a reusable search 
service for intranets (to kill the need for htdig which doesn't really go 
beyond HTML files).

I have the feeling that a couple of changes in Medusa would render its 
usability much greater than the current prospect. I bet if this gets worked 
upon, even the KDE people would get around to using it. Remember how ugly and 
slow the search box in KDE is. And another thing: it's slower than Windows' 
file search tool. KDE already takes advantage of SGI FAM. Medusa could be the 
search service everyone expected.

good luck!

Rudd-O



===========================================================
     UNIVERSIDAD TECNICA FEDERICO SANTA MARIA
                 CAMPUS GUAYAQUIL
        CENTRO DE SERVICIOS INFORMATICOS
Mail enviado a traves de IMP-USM: http://www.usm.edu.ec/imp
===========================================================



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]