Some analysis on live.gnome.org performance



Got curious about performance on live.gnome.org; the observed macro
pattern for system performance was:

 - Load is very spiky, sometimes low, sometimes quite high
 - When it's high, there are httpd processes running at high
   CPU utilization or spending most of their time in syscalls.
 - Bottleneck seems to be CPU rather than disk - disk utilization
   is quite low and is principally writes from httpd logging.

Stracing the high-cpu and high-disk-wait httpd processes indicated that
they were doing "strange things" - e.g., stat'ing through the page
heirarchy looking for attachments for every page in the Wiki, so I
wanted to know what requests they were processing. 

To try and figure this out, I temporarily modified the httpd
configuration to include the processing time for each page in the log
files and grep'ed out long-running page requests for half an hour of
usage.

There were 73 requests that took more than 10 seconds to process.

 16 requests for /TitleIndex, min=20s, max=190s
 
 18 requests for /WordIndex, min=35s, max=234s

 25 requests for attachments, min=11s, max=38s
  Most, though not all of these of these were for large images, 
  500k or more

 6 misc. POSTs (newaccount, login, edit, AttachFile), min=13s, max=66s

 4 requests for wiki pages: min=16s, max=120s

 2 requests for Category pages: min=12s, max=40s

 1 request for AdvancedSearch: 11s

I think it's fair to assume that the long-times for attachments and in
some cases for random pages are due to network issues - clients getting
data slowly and tying up an httpd issue. So, the thing that really
stands out here are the /TitleIndex and /WordIndex requests - why are we
getting all these requests for these expensive pages that aren't
obviously linked to.

So, let's look at the first three requests for /WordIndex:

 IP: 195.27.20.2
 Time: 10/Dec/2009:16:01:25 +0000
 Request: GET /WordIndex?action=print HTTP/1.0" 
 Bytes: 1865069
 Referrer: "-" 
 User agent: "Mozilla/4.0 (compatible;)"
 Time: 168.302989

 IP: 195.27.20.2 
 Time: 10/Dec/2009:16:01:25 +0000
 Request: GET /WordIndex HTTP/1.0
 Bytes: 1867640 
 Referrer: "http://live.gnome.org/Tomboy/PluginList"; 
 User agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
      SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; 
     .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8; 
     .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" 
 Time: 179.532250

 IP: 93.174.145.75
 Time: 10/Dec/2009:16:03:39 +0000
 Request: GET /WordIndex HTTP/1.1"
 Bytes: 1867640
 User agent: "Mozilla/4.0 (compatible;)" 
 Time: 52.201152

 IP: 192.196.142.21 
 Time: 10/Dec/2009:16:03:36 +0000
 Request: GET /WordIndex HTTP/1.1
 Bytes: 1867640 
 User agent: "Mozilla/4.0 (compatible;)"
 Time: 62.462147

So, the thing that stands out here is the consistent User Agent for
three out of the four, and the fact that the fourth request, while with
a different agent comes from the same IP at the same time as the first.

If you do a web search, you'll find that this user agent is attributed
to being used by "Blue Coat" proxy server products which apparently do
speculative prefetching based on page contents.

What page contents are they prefetching on? - if you look at the source
of one of the wiki pages - we see (e.g., for /GnomeShell)

<link rel="Start" href="/Home">
<link rel="Alternate" title="Wiki Markup" href="/GnomeShell?action=raw">
<link rel="Alternate" media="print" title="Print View" href="/GnomeShell?action=print">
<link rel="Search" href="/FindPage">
<link rel="Index" href="/TitleIndex">
<link rel="Glossary" href="/WordIndex">
<link rel="Help" href="/HelpOnFormatting">

There are in fact *no* obvious links to /TitleIndex and /WordIndex or to the
printable versions of pages anywhere in the page, and I'm not aware of any current
browsers that present these links content in the user interface. So to
summarize:

 Our performance on live.gnome.org is being killed by speculative
 prefetching on URLs that are added because they seemed like a good
 idea but have no actual purpose on the page.

Possible fixes:

 - Block /TitleIndex and /WordIndex entirely - they aren't useful pages
 - Block the Blue Coat fetches by User Agent (this, however, apparently
   doesn't get all the prefetches, sometimes it uses the user agent
   of the requesting client.)
 - Use apache's mod_cache facilities to cache /TitleIndex, /WordIndex
 - Patch Moin to omit this section of the pages

Don't have a lot of opinion which one of these or combination of these 
is best - the last one makes some sense to me.

- Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]