This program makes the same mistake as gtk-perf: it sends the drawing requests, but doesn't wait for the X server to actually perform them. So you are essentially profiling Cairo and Xlib. You aren't seeing things like "the X server takes 50% of the CPU time while the benchmark is running".
I would call excluding the X server a feature. Sometimes desireable, sometimes not. There are a couple of situation where you want to do just that: 1. Remote X connection, even if we're just talking a local network. 2. Multi-CPU machines. In either case, X is running truly is parallel with the program. M.