Re: Some analysis on live.gnome.org performance



On Thu, 2009-12-10 at 11:17 -0800, Jeff Schroeder wrote:
> On Thu, Dec 10, 2009 at 10:47 AM, Owen Taylor <otaylor redhat com> wrote:
> > On Thu, 2009-12-10 at 10:30 -0800, Jeff Schroeder wrote:
> >
> >> > Possible fixes:
> >> >
> >> >  - Block /TitleIndex and /WordIndex entirely - they aren't useful pages
> >> >  - Block the Blue Coat fetches by User Agent (this, however, apparently
> >> >   doesn't get all the prefetches, sometimes it uses the user agent
> >> >   of the requesting client.)
> >> >  - Use apache's mod_cache facilities to cache /TitleIndex, /WordIndex
> >> >  - Patch Moin to omit this section of the pages
> >> >
> >> > Don't have a lot of opinion which one of these or combination of these
> >> > is best - the last one makes some sense to me.
> >> >
> >> > - Owen
> >>
> >> Sorry Owen I forgot to reply all the first time.
> >>
> >> The last one makes a lot of sense however it will require updating the
> >> patch as we upgrade moinmoin. What are the downsides of just blocking
> >> both of those URLS with a shiney gnome 403 page? Besides it being
> >> nifty to see those pages, is there any value add in keeping them?
> >
> > Downsides I could see:
> >
> >  - These pages are linked to from http://live.gnome.org/HelpForBeginners
> >   and might have some small utility
> >
> >  - Just blocking the /TitleIndex and /WordIndex won't keep Blue Coat
> >   from predictively scraping other URLs in that section.
> >
> >   From a rough grep, 10% of the page hits on live.gnome.org are
> >   for action=raw or action=print.
> >
> >   (Since there is no UI for getting to action=raw or action=print
> >   I can find, we could also possibly just block those as well.)
> 
> Would it be possible to somehow programmatically generate these
> specific pages from a cronjob and have moin serve a static page? If
> BlueCoat is lying about it's user agent there isn't much of a way to
> stop it and not kill legitimate users every so often. The problem is
> that those 2 pages, helpful or not, are killing the user experience
> for everyone else.

I think that would be best done as I mentioned above using mod_cache -
it's not very hard to set up Apache to cache some URLs for 6 hours or a
day, or whatever.

But really, if no option is obviously easy, then we should just
block those two URls until something better comes along. Maybe just do
that even if we do want to do something better ... my analysis could
possibly be way offbase, so testing it with 5 minutes of work is
probably a good idea.

- Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]