Re: Adding synchronization to the WM spec



On 11/7/11 9:46 AM, Owen Taylor wrote:
On Tue, 2011-11-01 at 22:43 -0700, James Jones wrote:
I'm trying to make time to read through your proposals/code in more
detail, but my record in the "making time for things" area is pretty
abysmal, so some brief initial comments on the un-implemented fence-sync
portion of the spec below:

Thanks for the response here - it's very useful even at a high level.

[...]

   * I'm not really sure how the fence synchronization is supposed
     to work for the case of a direct rendering GL client. Is the combination
     of glXSwapBuffers() and XSyncTriggerFence() sufficient?

Good point.  This is messy specification-wise, but in practice, I think
this is indeed sufficient.  Our implementation always sends damage
events for GL rendering only after the rendering that generates the
damage has been completed on the GPU, and from what I understand, open
source implementations order their accesses to the surfaces implicitly
and/or do the swap operation in the GLX server anyway, so the
XSyncTriggerFence() isn't strictly necessary, but it isn't particularly
harmful if you want to leave it to keep the design clean.  The composite
manager will wait for both the fence sync and the damage event I assume,
so even though the fence trigger doesn't execute in the right GPU
thread, it will work out right.

Hmm, does the wait for the GPU occur at the time that buffers or
swapped, or are damage events emitted at some later point? If the damage
events are done asynchronously later than the WM manager would see:

  Counter Update to Odd Value
  Counter Update to Even Value
  Damage event

I'm not completely sure I follow what you mean by asynchronous, but I think that yes, they are asynchronously generated. To be clear, an application doing this today:

XSyncChangeCounter(<+1>);
glXSwapBuffers(); /* damage 1 */
XSyncChangeCounter(<+1>);
glXSwapBuffers(); /* damage 2 */

Could indeed generate this sequence of events if the scheduling worked out just right I think:

Counter Update to Odd Value
Counter Update to Even Value
Damage Event 1
Damage Event 2

Things should still display correctly, but the way I've written things
the compositor is supposed to fall back to doing it's own
XSyncTriggerFence() in that case, which would spoil the point of the
exercise from an efficiency point of view. And frame completion clien
  messages wouldn't work right.

I'll have to look at your proposals in more detail, but except for some small overhead for the unnecessary XSyncTriggerFence(), I don't see how this is any different from a normal application. Assuming the GLX client application plays by the same rules as a non-GLX application and does this on swaps:

XSyncChangeCounter();
glXSwapBuffers();
XSyncTriggerFence();

It should look just like a non-GLX application. The damage event may arrive after the fence trigger, but isn't that true for a normal X application as well? I believe damage events were queued up until the block handler in X.

At some point I would like to add a second GL extension that allows
triggering the X Sync Fence objects from GL so this could be done
properly, but it's low priority given the above.  I omitted it from the
initial spec because it's much harder to implement efficiently.

I guess the question is the migration path - how do you go from a world
where a GL app can do anything it wants to damage the front buffer, to a
world where apps can optionally pass fence information to the compositor
and get useful performance improvements from that.

Right, and I've not fully worked that out. One idea I had been toying with was adding a X fence sync argument to the existing glXSwapBuffers() commands, and having those versions bypass our current delayed damage event generation logic (The stuff I mentioned above that delays the damage event generation until the GL rendering completes). That sounds like it would fit in well with your proposals. What do you think? Clients using this swap could set some other property on their windows or something if needed, to differentiate themselves from "legacy" clients.

One problem I see with your spec:  When are the fence sync objects
reset?  In the current form, it will only run properly for "L"
iterations.  I've found this non-trivial to solve efficiently in my code
(but very doable).  The solution you'll need is slightly different,
since the consumer and producer of fence sync triggers are separate
applications.

You are certainly correct this is something that needs to be discussed
in the specification.

If I read the sync spec correctly, one easy answer is that the client
just does:

  XsyncAwaitFence()
  XSyncResetFence()
  XSyncTriggerFece()
  <update counter>

right in a row when it finishes a frame. When the window manager sees
the updated counter via an AlarmNotify event, the ResetFence is
guaranteed to have been handled by the X server, since ResetFence is
done "immediately". The client can potentially omit the AwaitFence if
it's already gotten a _NET_WM_SYNC_DRAWN for the previous usage of the
fence back from the window manager. I don't have enough of an idea about
implemention of fences to know whether that's worthwhile optimization or
not - what is the expected cost of waiting for an already triggered fence.

This gets hard to think about. My concern with the above is ensuring the rendering submitted by the composite manager that was queued behind a wait for that fence has completed before the reset operation in your code above. If I'm reading things correctly, a composite manager would do something like:

/* Wait for software notification that a frame is complete, and get a fence sync that indicates the HW has finished rendering said frame */
waitEventsAndCounters(&glFenceSyncOut, &windowObjOut);

/* Wait for the fence sync on the GPU to ensure the frame is complete there before doing our compositing */
glWaitSync(glFenceSync);

/* Composite the new frame with OpenGL */
paintGL();

/* Send _NET_WM_SYNC_DRAWN */
notifyFrameHandled(windowObjOut);

The concern is that the SW-based notification done in the final step above reaches the client app and the app acts on it before the GPU-based sync wait completes. Since there's a queue of fences in use, this means it would actually need to have looped through the whole queue, so the race condition is pretty unlikely, but very possible on fast GPUs and busy GPUs.

This is again where a way to write X fences from GL would come in handy. The composite manager could trigger a second fence after its wait completes, which the client could also wait for with XSyncAwaitFence() before resetting the fence.

Alternatively, the composite manager could simply glClientWaitSync() before sending the notification. This could be done in a separate thread/context if there's concern about holding up the CPU thread of the composite manager.

Thanks,
-James

- Owen



nvpublic


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]