Re: Some analysis on performance

On Thu, Dec 10, 2009 at 9:50 AM, Owen Taylor <otaylor redhat com> wrote:
> Got curious about performance on; the observed macro
> pattern for system performance was:
>  - Load is very spiky, sometimes low, sometimes quite high
>  - When it's high, there are httpd processes running at high
>   CPU utilization or spending most of their time in syscalls.
>  - Bottleneck seems to be CPU rather than disk - disk utilization
>   is quite low and is principally writes from httpd logging.
> Stracing the high-cpu and high-disk-wait httpd processes indicated that
> they were doing "strange things" - e.g., stat'ing through the page
> heirarchy looking for attachments for every page in the Wiki, so I
> wanted to know what requests they were processing.
> To try and figure this out, I temporarily modified the httpd
> configuration to include the processing time for each page in the log
> files and grep'ed out long-running page requests for half an hour of
> usage.
> There were 73 requests that took more than 10 seconds to process.
>  16 requests for /TitleIndex, min=20s, max=190s
>  18 requests for /WordIndex, min=35s, max=234s
>  25 requests for attachments, min=11s, max=38s
>  Most, though not all of these of these were for large images,
>  500k or more
>  6 misc. POSTs (newaccount, login, edit, AttachFile), min=13s, max=66s
>  4 requests for wiki pages: min=16s, max=120s
>  2 requests for Category pages: min=12s, max=40s
>  1 request for AdvancedSearch: 11s
> I think it's fair to assume that the long-times for attachments and in
> some cases for random pages are due to network issues - clients getting
> data slowly and tying up an httpd issue. So, the thing that really
> stands out here are the /TitleIndex and /WordIndex requests - why are we
> getting all these requests for these expensive pages that aren't
> obviously linked to.
> So, let's look at the first three requests for /WordIndex:
>  IP:
>  Time: 10/Dec/2009:16:01:25 +0000
>  Request: GET /WordIndex?action=print HTTP/1.0"
>  Bytes: 1865069
>  Referrer: "-"
>  User agent: "Mozilla/4.0 (compatible;)"
>  Time: 168.302989
>  IP:
>  Time: 10/Dec/2009:16:01:25 +0000
>  Request: GET /WordIndex HTTP/1.0
>  Bytes: 1867640
>  Referrer: "";
>  User agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
>      SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322;
>     .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8;
>     .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
>  Time: 179.532250
>  IP:
>  Time: 10/Dec/2009:16:03:39 +0000
>  Request: GET /WordIndex HTTP/1.1"
>  Bytes: 1867640
>  User agent: "Mozilla/4.0 (compatible;)"
>  Time: 52.201152
>  IP:
>  Time: 10/Dec/2009:16:03:36 +0000
>  Request: GET /WordIndex HTTP/1.1
>  Bytes: 1867640
>  User agent: "Mozilla/4.0 (compatible;)"
>  Time: 62.462147
> So, the thing that stands out here is the consistent User Agent for
> three out of the four, and the fact that the fourth request, while with
> a different agent comes from the same IP at the same time as the first.
> If you do a web search, you'll find that this user agent is attributed
> to being used by "Blue Coat" proxy server products which apparently do
> speculative prefetching based on page contents.
> What page contents are they prefetching on? - if you look at the source
> of one of the wiki pages - we see (e.g., for /GnomeShell)
> <link rel="Start" href="/Home">
> <link rel="Alternate" title="Wiki Markup" href="/GnomeShell?action=raw">
> <link rel="Alternate" media="print" title="Print View" href="/GnomeShell?action=print">
> <link rel="Search" href="/FindPage">
> <link rel="Index" href="/TitleIndex">
> <link rel="Glossary" href="/WordIndex">
> <link rel="Help" href="/HelpOnFormatting">
> There are in fact *no* obvious links to /TitleIndex and /WordIndex or to the
> printable versions of pages anywhere in the page, and I'm not aware of any current
> browsers that present these links content in the user interface. So to
> summarize:
>  Our performance on is being killed by speculative
>  prefetching on URLs that are added because they seemed like a good
>  idea but have no actual purpose on the page.
> Possible fixes:
>  - Block /TitleIndex and /WordIndex entirely - they aren't useful pages
>  - Block the Blue Coat fetches by User Agent (this, however, apparently
>   doesn't get all the prefetches, sometimes it uses the user agent
>   of the requesting client.)
>  - Use apache's mod_cache facilities to cache /TitleIndex, /WordIndex
>  - Patch Moin to omit this section of the pages
> Don't have a lot of opinion which one of these or combination of these
> is best - the last one makes some sense to me.
> - Owen

Sorry Owen I forgot to reply all the first time.

The last one makes a lot of sense however it will require updating the
patch as we upgrade moinmoin. What are the downsides of just blocking
both of those URLS with a shiney gnome 403 page? Besides it being
nifty to see those pages, is there any value add in keeping them?

Jeff Schroeder

Don't drink and derive, alcohol and analysis don't mix.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]