Re: [Tracker] Review request : Bridge manager subsystem



El 16/08/09 14:42, Martyn Russell escribió:
On 16/08/09 09:05, Ivan Frade wrote:
Hi Adrien!

Hi all,
Hi all, first of all thanks for the feedback

as part of my gsoc project, I implemented a system to allow
Tracker to
index online resources. It's basically split into two parts :
1. The bridges are small programs which connect to a remote
webservice
and import data into Tracker via its SPARQL interface. They are
standalone processes and all expose a common DBus interface. They are
started by DBus activation.

Great! BTW, we call those processes "miners".

Yep we do :)
OK, I'll sed s/bridge/miner then :)

2. A bridge manager, in charge of calling the bridges to ask them to
pull the data. Any program can talk to the bridges, here I
implemented
a bridge manager as a Tracker subsytem.

Uhm, the original idea was that each bridge/miner had its own
configuration and it is an independient entity; No control at all from
tracker itself.

Martyn was working in a generic configuration super-class, so you dont
need to write from scratch the configuration management for each
miner.

Yea, we already have a preliminary DBus API for this. Perhaps we
could discuss this further with you Adrien to make sure the API is
sufficient. Right now the file system miner is the only one we have
to worry about so our API might be a bit vague.
As of today I only need a simple key - value system, as TrackerConfig
does... I also need info like network status, but I don't know if it
would be part of that API too...

Basically, what it does is
list the available bridges, and call the synchronization method
(Pull()) at a given interval. The default interval is 300 seconds,
and
is currently shared by all the bridges. It also exposes a DBus
interface which allows to set the Pull () interval and to force a
call
to Pull() on one or all of the bridges.

I prefer completely autonomous bridges/miners than a "poll" solution;
besides, the poll code should be in tracker-store and it doesn't fit
there. Ttracker-store is just a SparlQL DB with a dbus interface; no
extra logic (DB and sparQL handling and backup stuff, that's all in
the store).

I agree with Ivan here. Miners should know how and when to pull
data, not be polled from the store. Also the store shouldn't have to
manage miners at all. This "sync" method is currently done by
starting all miners on startup using a desktop file and letting them
run the whole time (instead of exiting after inactivity like we used
to).
(instead of exiting after inactivity like we used to) -> ok, that's
what I was planning to do, to minimize memory usage on small devices
(hence the dbus activation)... But if it does not fit in the Tracker
model, it can be changed easily (it's actually what currently happens
:-) ). About autonomous polling, that'll allow me to define different
polling intervals (see the "Known issues" section), so it's actually a
good thing.

Details :
The object basically lists the available bridges (they all have a
.desktop file in /usr/share/tracker/bridges), and keeps the list in
memory.

This part is really interesting, because we need a tracker-applet that
receive information from all those bridges. The idea i talk with
martyn was something like the network manager applet, but instead of
listing networks, listing miners and their status.

Yea, the way we planned to do this was by using the DBus API to get
a list of names with the Tracker prefix for miners and expect them
all to have the same base class API (pause/continue/get_status/etc).
yep, that's more or less what I do know except that I scan a directory
instead of calling a DBus method to get the list. What program would
be in charge to keep the list of bridges and expose it via dbus ?

When you click on the applet you would see a list like:

Filesystem OK
RSS           Updating
Flickr          Paused

Maybe a pause/play button to pause/run the miners. To do this we need
all miners present in DBus, so maybe we can use your code to implement
this.

Yea, as Ivan says, this is what we want to do. This is quite
important for users to see how things are progressing and to be able
to control those processes.

Where to get the code :
git://git.mymadcat.com/tracker , branch tracker-bridges
Then, if you want to get some bridges to play with, first install
git://git.mymadcat.com/vapi
git://git.mymadcat.com/libtrackerbridge
git://git.mymadcat.com/bridge-manager
git://git.mymadcat.com/bridge-facebook
git://git.mymadcat.com/bridge-flickr
git://git.mymadcat.com/bridge-twitter
git://git.mymadcat.com/bridge-gdata

  WOW, i want to try this!

Nice, I will take a look this week some time.
Note : As I *still* don't have internet at home, contacting me by mail
will surely be more efficient than IRC for questions

Unknown issues :
I'm sure there are many... Please report them to me !

  A thing that is common for almost all miners is the
"connection_status" functions to know whether we are online, or
whether the connection is good to retrieve massive data. It would be
great to do it with an interface and multiple implementations and in a
library. How about create a libtracker-miner library? (Something
similar to the old tracker-module library that Carlos created in 0.6?)

Ivan, we currently have libtracker-miner in a separate branch which
we are working on. It will have that DBus API shared amongst all
miners and also the file system crawling basics for those miners
which need to crawl some directories for their data.

We could include some more stuff in there which makes sense for > 1
miner (i.e. shared APIs).
I have a libtracker-bridge which does this, as well as exporting
constants like Tracker's DBus name etc... Sounds very similar :) Maybe
I should merge with libtracker-miner ?

Cheers

  Cheers, and congratulations for the good work!

Yes, I second that. Really good to see work done here.

Thanks again for taking the time to feedback :)





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]