Re: Adding synchronization to the WM spec

From: James Jones <jajones nvidia com>
To: Owen Taylor <otaylor redhat com>
Cc: "wm-spec-list gnome org" <wm-spec-list gnome org>
Subject: Re: Adding synchronization to the WM spec
Date: Mon, 7 Nov 2011 10:18:00 -0800

On 11/7/11 9:46 AM, Owen Taylor wrote:

On Tue, 2011-11-01 at 22:43 -0700, James Jones wrote:

I'm trying to make time to read through your proposals/code in more
detail, but my record in the "making time for things" area is pretty
abysmal, so some brief initial comments on the un-implemented fence-sync
portion of the spec below:


Thanks for the response here - it's very useful even at a high level.

[...]

   * I'm not really sure how the fence synchronization is supposed
     to work for the case of a direct rendering GL client. Is the combination
     of glXSwapBuffers() and XSyncTriggerFence() sufficient?


Good point.  This is messy specification-wise, but in practice, I think
this is indeed sufficient.  Our implementation always sends damage
events for GL rendering only after the rendering that generates the
damage has been completed on the GPU, and from what I understand, open
source implementations order their accesses to the surfaces implicitly
and/or do the swap operation in the GLX server anyway, so the
XSyncTriggerFence() isn't strictly necessary, but it isn't particularly
harmful if you want to leave it to keep the design clean.  The composite
manager will wait for both the fence sync and the damage event I assume,
so even though the fence trigger doesn't execute in the right GPU
thread, it will work out right.


Hmm, does the wait for the GPU occur at the time that buffers or
swapped, or are damage events emitted at some later point? If the damage
events are done asynchronously later than the WM manager would see:

  Counter Update to Odd Value
  Counter Update to Even Value
  Damage event

I'm not completely sure I follow what you mean by asynchronous, but Ithink that yes, they are asynchronously generated. To be clear, anapplication doing this today:


XSyncChangeCounter(<+1>);
glXSwapBuffers(); /* damage 1 */
XSyncChangeCounter(<+1>);
glXSwapBuffers(); /* damage 2 */

Could indeed generate this sequence of events if the scheduling workedout just right I think:


Counter Update to Odd Value
Counter Update to Even Value
Damage Event 1
Damage Event 2

Things should still display correctly, but the way I've written things
the compositor is supposed to fall back to doing it's own
XSyncTriggerFence() in that case, which would spoil the point of the
exercise from an efficiency point of view. And frame completion clien
  messages wouldn't work right.

I'll have to look at your proposals in more detail, but except for somesmall overhead for the unnecessary XSyncTriggerFence(), I don't see howthis is any different from a normal application. Assuming the GLXclient application plays by the same rules as a non-GLX application anddoes this on swaps:


XSyncChangeCounter();
glXSwapBuffers();
XSyncTriggerFence();

It should look just like a non-GLX application. The damage event mayarrive after the fence trigger, but isn't that true for a normal Xapplication as well? I believe damage events were queued up until theblock handler in X.

At some point I would like to add a second GL extension that allows
triggering the X Sync Fence objects from GL so this could be done
properly, but it's low priority given the above.  I omitted it from the
initial spec because it's much harder to implement efficiently.


I guess the question is the migration path - how do you go from a world
where a GL app can do anything it wants to damage the front buffer, to a
world where apps can optionally pass fence information to the compositor
and get useful performance improvements from that.

Right, and I've not fully worked that out. One idea I had been toyingwith was adding a X fence sync argument to the existing glXSwapBuffers()commands, and having those versions bypass our current delayed damageevent generation logic (The stuff I mentioned above that delays thedamage event generation until the GL rendering completes). That soundslike it would fit in well with your proposals. What do you think?Clients using this swap could set some other property on their windowsor something if needed, to differentiate themselves from "legacy" clients.

One problem I see with your spec:  When are the fence sync objects
reset?  In the current form, it will only run properly for "L"
iterations.  I've found this non-trivial to solve efficiently in my code
(but very doable).  The solution you'll need is slightly different,
since the consumer and producer of fence sync triggers are separate
applications.


You are certainly correct this is something that needs to be discussed
in the specification.

If I read the sync spec correctly, one easy answer is that the client
just does:

  XsyncAwaitFence()
  XSyncResetFence()
  XSyncTriggerFece()
  <update counter>

right in a row when it finishes a frame. When the window manager sees
the updated counter via an AlarmNotify event, the ResetFence is
guaranteed to have been handled by the X server, since ResetFence is
done "immediately". The client can potentially omit the AwaitFence if
it's already gotten a _NET_WM_SYNC_DRAWN for the previous usage of the
fence back from the window manager. I don't have enough of an idea about
implemention of fences to know whether that's worthwhile optimization or
not - what is the expected cost of waiting for an already triggered fence.

This gets hard to think about. My concern with the above is ensuringthe rendering submitted by the composite manager that was queued behinda wait for that fence has completed before the reset operation in yourcode above. If I'm reading things correctly, a composite manager woulddo something like:

/* Wait for software notification that a frame is complete, and get afence sync that indicates the HW has finished rendering said frame */

waitEventsAndCounters(&glFenceSyncOut, &windowObjOut);

/* Wait for the fence sync on the GPU to ensure the frame is completethere before doing our compositing */

glWaitSync(glFenceSync);

/* Composite the new frame with OpenGL */
paintGL();

/* Send _NET_WM_SYNC_DRAWN */
notifyFrameHandled(windowObjOut);

The concern is that the SW-based notification done in the final stepabove reaches the client app and the app acts on it before the GPU-basedsync wait completes. Since there's a queue of fences in use, this meansit would actually need to have looped through the whole queue, so therace condition is pretty unlikely, but very possible on fast GPUs andbusy GPUs.

This is again where a way to write X fences from GL would come in handy.The composite manager could trigger a second fence after its waitcompletes, which the client could also wait for with XSyncAwaitFence()before resetting the fence.

Alternatively, the composite manager could simply glClientWaitSync()before sending the notification. This could be done in a separatethread/context if there's concern about holding up the CPU thread of thecomposite manager.


Thanks,
-James

- Owen


nvpublic

Follow-Ups:
- Re: Adding synchronization to the WM spec
  - From: Owen Taylor

References:
- Re: Adding synchronization to the WM spec
  - From: James Jones
- Re: Adding synchronization to the WM spec
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]