Re: gtk3-demo dies with EAGAIN when running under Weston



On Fri, Mar 20, 2015 at 10:53:42AM +0200, Pekka Paalanen wrote:
On Thu, 19 Mar 2015 23:45:00 -0400
Lyude <thatslyude gmail com> wrote:

On Fri, 2015-03-20 at 11:37 +0800, Jonas Ådahl wrote:

Try to apply this patch http://patchwork.freedesktop.org/patch/44994/ .


Jonas

I just tried the patch and it fixed the issue. Thanks a ton for the
quick reply to my e-mail and the patch :). Now I can finally get back to
work.

Now that seems odd.

AFAIU, basically you get EAGAIN when the kernel doesn't have the space
to buffer your sent data. This means that either the receiver (the
compositor) is not reading the socket, or the client is flooding the
socket.

I would suggest there may be two bugs in GTK+ here:

1. Flooding the socket to begin with. I really don't understand why
   "input: Make setting the same pointer cursor state again a no-op"
   would be fixing this issue. Does GTK+ send tons of requests for
   every pointer enter/leave or something?

When weston got a wl_pointer.set_cursor with the same cursor and the
same hot spot, it unmapped the surface, sent wl_surfac.leave on the
cursor surface, then mapped it again in the exact same position, sending
a wl_surface.enter to that same surface. When GTK+ receives an
enter/leave on a cursor surface it calculates the buffer scale it should
use and and lazilly just sets the same cursor again because it doesn't
keep track of that itself. In other words, we'd get a feedback loop that
eventually fills the buffer. One could argue that GTK+ shouldn't be as
lazy, but I don't think setting a cursor suface with an identical state
should make the cursor leave the output and then enter immediately
either.


2. Not dealing with EAGAIN. See the documentation for
   wl_display_flush[1]. When hitting EAGAIN there, the event loop
   should poll for writable and wait before issuing more Wayland
   requests. Does GTK+ do this already?

AFAICS GTK+ doesn't check for EAGIN, but in this particular sitation
we'd just get an eternal busy loop instead of an abort.

Jonas


Now, 2. is if the error comes back from wl_display_flush(). The other
case is when calling a request function attempts to buffer the message
but wl_closure_send() fails because wl_connection_flush() fails. These
are libwayland-client internal functions. When wl_connection_flush()
fails, instead of failing everything, I think we should just allocate
more space until implicit flush succeeds or we can return EAGAIN from
wl_display_flush().

Can you check if the failure comes from wl_display_flush() or the
internal failure of the implicit flush? A backtrace would tell.

In any case, I think it might good to look into removing the abort()
from wl_proxy_marshal_array_constructor(), unless someone makes a case
the app being very broken if it sends that much data without spinning
the event loop.

However, growing the send buffer unlimited is not a good idea, because
the bigger it gets, it means the more behind the app (the compositor
actually) is, and at some point that starts to indicate an app bug.


Thanks,
pq

[1]
http://wayland.freedesktop.org/docs/html/apb.html#Client-classwl__display_1a8463b6e5f4cf9a2a3ad2d543aedcf429


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]