Re: Best way of using Beagle to index data CDs



> > You are more or less in the right track. As Kevin pointed out, one way
> > it so leverage static-indexes. Due to the way static indexes work, it
...
> > medium. If you merge several indexes, there would be two kinds of
> > problems:
> > 1) Files that are not in the filesystem would not be reported (happens
> > for any static index)
> > 2) If there are files in different removable media but with same
> > absolute path, then only one of them will be returned. And there might
> > be more weirdness.

> Problem 2 can be solved by making sure that one uses --remap correctly
> to make each prefix unique, for example, "/media/<disc id>/".

--remap doesn't work. I even think it was removed from svn trunk.

> Problem 1 is a complete bummer.  That makes beagle more or less
> unusable to this end.  How do we solve this?
>
> It seems that you've basically solved both of these problems in
> BuildRemovableIndex.cs by introducing a new URI protocol (removable)
> for solving problem 1 and using media_name for solving problem 2.
>
> However, BuildRemovableIndex.cs hasn't been completed.  It doesn't
> seem to be missing that much, though.
>
> How would one tell Beagle to report any removable:///* URI?
> I guess I'm not familiar enough with the structure of Beagle to know
> where to begin resolving these issues.

None of these issues require too much internal detail of beagle, so I
am trying to describe whats there and whats missing. Pause me if you
miss something.

By nature, URIs should be unique. So the uri should be changed to use
the media_name as well e.g.
removable://media_name/relative/path/to/file

Using the "removable:" scheme is just to capture the path and media
name is a different kind of URL. I dont think its standard and beagle
clients should interpret it as a removable media URL where the host of
the URL is the name of the media and the path of the URL is the
relative path of the file relative to where the media is mounted.
Note: beagle-search and other clients out there don't yet know about
removable media and would probably ignore such results. They need to
be patched too.

BuildRemovableIndex.cs is just a smart wrapper around BuildIndex.cs
which does the above mentioned changes.

The static backends are handled by StaticQueryable.cs;
http://svn.gnome.org/viewvc/beagle?view=revision&revision=3108
contained a modified StaticQueryable.cs which knew about removable
media. When started, the backend would load the possible mappings from
config file and store the mapping in a mapping_table. Every result
passes through the backend just before it is returned. At that time,
the modified StaticQueryable would take a removable media, extract the
media_name and the relative path, use the mapping_table to get the
mounted path for that media_name, append the relative path to the end
of the mounted path and return the correct file:/// url. If the media
is not mounted, it would just report the original removable:// url and
mark a flag saying media not found. The client can then suitably
interact with the user. E.g. the client can either drop all the
un-mounted URLs or display all and when a user clicks on an unmounted
URL, requests the user to mount that medium and then opens the file.
The client interaction needs to be added to beagle-search.

Another major part that needs to be completed is deciding where/how to
store the media_name info for the medium. I was thinking
beagle-removable-index would work like

$ beagle-removable-index --build --medium medium_name [--config
/path/to/new/config] --target /path/to/index/ ...

would create a index at the path pointed to by target (as it happens
now). It would also store a removableconfig.xml file with the name of
the medium (and other possible configuration values) at
/path/to/new/config. If --config is absent, the location will default
to /path/to/index/removableconfig.xml

$ beagle-removable-index --mount [--config /path/to/config] --target
/path/to/index

will inform running beagled that a removable index at /path/to/index
is added. If the --config... is present, read the name and other
information from there or try to read to the config information from
/path/to/index/removableconfig.xml
The running beagled will inform staticqueryable about the new medium
being inserted which will in turn store the medium_name and
/path/to/index to its mapping table.

$ beagle-removable-index --unmount ... similarly

It doesnt _have_ to work this way. This is just what I thought would
make everybody happy.

The last major piece which wasnt done (I think, I dont remember
completely) is the real-time loading of new indexes. When
StaticQueryable is informed about a new index and a mapping, and if
the index at /path/to/index is not already loaded, then load the new
static_index. This should not be too difficult, just call into
QueryDriver.cs (see LoadRemovableMediaQueryables). If the index at
/path/to/index is already loaded, then just update the
(medium_name,path) mapping. Lucene allows beagle to silently update
the index in the background so $b-r-i --build followed by $b-r-i
--mount where some existing index is being modifed will be handled
transparently.

If you want, feel free to work on the above issues. Questions and
patches are always welcome :).

- dBera

-- 
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]