Re: Some analysis on live.gnome.org performance
- From: Jeff Schroeder <jeffschroed gmail com>
- To: Owen Taylor <otaylor redhat com>
- Cc: gnome-infrastructure gnome org, Olav Vitters <olav bkor dhs org>
- Subject: Re: Some analysis on live.gnome.org performance
- Date: Thu, 10 Dec 2009 10:30:10 -0800
On Thu, Dec 10, 2009 at 9:50 AM, Owen Taylor <otaylor redhat com> wrote:
> Got curious about performance on live.gnome.org; the observed macro
> pattern for system performance was:
>
> - Load is very spiky, sometimes low, sometimes quite high
> - When it's high, there are httpd processes running at high
> CPU utilization or spending most of their time in syscalls.
> - Bottleneck seems to be CPU rather than disk - disk utilization
> is quite low and is principally writes from httpd logging.
>
> Stracing the high-cpu and high-disk-wait httpd processes indicated that
> they were doing "strange things" - e.g., stat'ing through the page
> heirarchy looking for attachments for every page in the Wiki, so I
> wanted to know what requests they were processing.
>
> To try and figure this out, I temporarily modified the httpd
> configuration to include the processing time for each page in the log
> files and grep'ed out long-running page requests for half an hour of
> usage.
>
> There were 73 requests that took more than 10 seconds to process.
>
> 16 requests for /TitleIndex, min=20s, max=190s
>
> 18 requests for /WordIndex, min=35s, max=234s
>
> 25 requests for attachments, min=11s, max=38s
> Most, though not all of these of these were for large images,
> 500k or more
>
> 6 misc. POSTs (newaccount, login, edit, AttachFile), min=13s, max=66s
>
> 4 requests for wiki pages: min=16s, max=120s
>
> 2 requests for Category pages: min=12s, max=40s
>
> 1 request for AdvancedSearch: 11s
>
> I think it's fair to assume that the long-times for attachments and in
> some cases for random pages are due to network issues - clients getting
> data slowly and tying up an httpd issue. So, the thing that really
> stands out here are the /TitleIndex and /WordIndex requests - why are we
> getting all these requests for these expensive pages that aren't
> obviously linked to.
>
> So, let's look at the first three requests for /WordIndex:
>
> IP: 195.27.20.2
> Time: 10/Dec/2009:16:01:25 +0000
> Request: GET /WordIndex?action=print HTTP/1.0"
> Bytes: 1865069
> Referrer: "-"
> User agent: "Mozilla/4.0 (compatible;)"
> Time: 168.302989
>
> IP: 195.27.20.2
> Time: 10/Dec/2009:16:01:25 +0000
> Request: GET /WordIndex HTTP/1.0
> Bytes: 1867640
> Referrer: "http://live.gnome.org/Tomboy/PluginList"
> User agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
> SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322;
> .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8;
> .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
> Time: 179.532250
>
> IP: 93.174.145.75
> Time: 10/Dec/2009:16:03:39 +0000
> Request: GET /WordIndex HTTP/1.1"
> Bytes: 1867640
> User agent: "Mozilla/4.0 (compatible;)"
> Time: 52.201152
>
> IP: 192.196.142.21
> Time: 10/Dec/2009:16:03:36 +0000
> Request: GET /WordIndex HTTP/1.1
> Bytes: 1867640
> User agent: "Mozilla/4.0 (compatible;)"
> Time: 62.462147
>
> So, the thing that stands out here is the consistent User Agent for
> three out of the four, and the fact that the fourth request, while with
> a different agent comes from the same IP at the same time as the first.
>
> If you do a web search, you'll find that this user agent is attributed
> to being used by "Blue Coat" proxy server products which apparently do
> speculative prefetching based on page contents.
>
> What page contents are they prefetching on? - if you look at the source
> of one of the wiki pages - we see (e.g., for /GnomeShell)
>
> <link rel="Start" href="/Home">
> <link rel="Alternate" title="Wiki Markup" href="/GnomeShell?action=raw">
> <link rel="Alternate" media="print" title="Print View" href="/GnomeShell?action=print">
> <link rel="Search" href="/FindPage">
> <link rel="Index" href="/TitleIndex">
> <link rel="Glossary" href="/WordIndex">
> <link rel="Help" href="/HelpOnFormatting">
>
> There are in fact *no* obvious links to /TitleIndex and /WordIndex or to the
> printable versions of pages anywhere in the page, and I'm not aware of any current
> browsers that present these links content in the user interface. So to
> summarize:
>
> Our performance on live.gnome.org is being killed by speculative
> prefetching on URLs that are added because they seemed like a good
> idea but have no actual purpose on the page.
>
> Possible fixes:
>
> - Block /TitleIndex and /WordIndex entirely - they aren't useful pages
> - Block the Blue Coat fetches by User Agent (this, however, apparently
> doesn't get all the prefetches, sometimes it uses the user agent
> of the requesting client.)
> - Use apache's mod_cache facilities to cache /TitleIndex, /WordIndex
> - Patch Moin to omit this section of the pages
>
> Don't have a lot of opinion which one of these or combination of these
> is best - the last one makes some sense to me.
>
> - Owen
Sorry Owen I forgot to reply all the first time.
The last one makes a lot of sense however it will require updating the
patch as we upgrade moinmoin. What are the downsides of just blocking
both of those URLS with a shiney gnome 403 page? Besides it being
nifty to see those pages, is there any value add in keeping them?
--
Jeff Schroeder
Don't drink and derive, alcohol and analysis don't mix.
http://www.digitalprognosis.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]