Re: firehose polling system
- From: Colin Walters <walters redhat com>
- To: Owen Taylor <otaylor redhat com>
- Cc: online-desktop-list gnome org
- Subject: Re: firehose polling system
- Date: Thu, 24 Jan 2008 11:59:22 -0500
On Wed, 2008-01-23 at 18:27 -0500, Owen Taylor wrote:
> On Wed, 2008-01-23 at 16:39 -0500, Colin Walters wrote:
> > Hi,
> >
> > Over the last few days I've been trying to design and code a new polling
> > system for Mugshot. The main goal is to reduce the database traffic
> > we're currently doing now checking for updates. This in turn should
> > improve the Online Desktop server by helping it stay up longer.
> >
> > It's called Firehose. The current design page is here:
> > http://developer.mugshot.org/wiki/Firehose
> >
> > I'll commit some code soon to the Mugshot SVN.
>
> Some questions:
>
> - How are "tasksets" sent from the master to the slave?
Basically the master has a list of active slaves, it just sends them
over a plain HTTP POST. My current thinking is to use basic POST-like
APIs for communication that doesn't need to be reliable like SQS.
> - How does a slave get details about information used to poll a task?
> (like the URL to poll or whatever) The wiki page only describes a
> task as a family/id pair.
It has a mapping from family->class implementation.
> - How is data needed for polling like private keys distributed to the
> slave?
Any private keys would have to be included in the configuration, as is
with the server now.
> - Is there any affinity for tasks? Do we always execute the same task
> on the same slave or each run of each task or is each run assigned
> to a task independently?
They're independent.
> - Do we have any way of sending If-Modified headers when applicable?
> If we are running this on EC2, we'll be paying per gigabyte of
> downloaded data.
That is a good point, I need to update the spec to say that the result
of a poll is (SHA1, timestamp). We can use the timestamp to send the
If-Modified.
> - Is the list of tasks persistently saved on the master, or does the
> server send tasks again on restart?
It's stored persistently. Currently it's a sqlite database.
> - If we wanted to "sync up" the set of tasks that the master is
> executing with the set that we should be running (I could imagine
> them getting out of sync for various reasons if we keep the tasks
> around persistently), how do we do it?
Another good point; what I was thinking of doing is writing a script to
create all the tasks now, but you could imagine having a specific
"dump/load" system where mugshot would store a snapshot of task IDs in
S3, and send a message to reload the firehose master from that list.
> - Does the master implement "poll tasks faster after changes"?
Not yet, though we should do that.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]