But, since problem is in cairo, I think it would be better to fix it there. Calling memcpy() for each pixel (2 or 3 bytes) is an overkill.
If the size is constant, gcc should optimize that call away, no ?