Re: Adding synchronization to the WM spec



On Mon, 2011-11-07 at 10:18 -0800, James Jones wrote:
[...]
> >>>    * I'm not really sure how the fence synchronization is supposed
> >>>      to work for the case of a direct rendering GL client. Is the combination
> >>>      of glXSwapBuffers() and XSyncTriggerFence() sufficient?
> >>
> >> Good point.  This is messy specification-wise, but in practice, I think
> >> this is indeed sufficient.  Our implementation always sends damage
> >> events for GL rendering only after the rendering that generates the
> >> damage has been completed on the GPU, and from what I understand, open
> >> source implementations order their accesses to the surfaces implicitly
> >> and/or do the swap operation in the GLX server anyway, so the
> >> XSyncTriggerFence() isn't strictly necessary, but it isn't particularly
> >> harmful if you want to leave it to keep the design clean.  The composite
> >> manager will wait for both the fence sync and the damage event I assume,
> >> so even though the fence trigger doesn't execute in the right GPU
> >> thread, it will work out right.
> >
> > Hmm, does the wait for the GPU occur at the time that buffers or
> > swapped, or are damage events emitted at some later point? If the damage
> > events are done asynchronously later than the WM manager would see:
> >
> >   Counter Update to Odd Value
> >   Counter Update to Even Value
> >   Damage event
> 
> I'm not completely sure I follow what you mean by asynchronous, but I 
> think that yes, they are asynchronously generated.  To be clear, an 
> application doing this today:
> 
> XSyncChangeCounter(<+1>);
> glXSwapBuffers(); /* damage 1 */
> XSyncChangeCounter(<+1>);
> glXSwapBuffers(); /* damage 2 */
> 
> Could indeed generate this sequence of events if the scheduling worked 
> out just right I think:
> 
> Counter Update to Odd Value
> Counter Update to Even Value
> Damage Event 1
> Damage Event 2

Yes, that's what I mean by asynchronous. The question is whether the
client can know how damage events interleave with it's stream of X
requests. Can you do glXWaitX() or glFinish() and know that the damage
events have gone out?

> > Things should still display correctly, but the way I've written things
> > the compositor is supposed to fall back to doing it's own
> > XSyncTriggerFence() in that case, which would spoil the point of the
> > exercise from an efficiency point of view. And frame completion clien
> >   messages wouldn't work right.
> 
> I'll have to look at your proposals in more detail, but except for some 
> small overhead for the unnecessary XSyncTriggerFence(), I don't see how 
> this is any different from a normal application.  Assuming the GLX 
> client application plays by the same rules as a non-GLX application and 
> does this on swaps:
> 
> XSyncChangeCounter();
> glXSwapBuffers();
> XSyncTriggerFence();
> 
> It should look just like a non-GLX application.  The damage event may 
> arrive after the fence trigger, but isn't that true for a normal X 
> application as well?  I believe damage events were queued up until the 
> block handler in X.

No, that's not the case - damage events are created at the point in the
rendering stream where the damage occurs, and are interleaved with other
events in a serial fashion.

Actually sending out damage events waits until the block handler or
until the client queue is flushed - but that's no different than normal
buffering of outgoing events; there is no reordering involved.

I don't really see how to make the frame coordination work if damage
events occur at some undefined later time - there's no way to
distinguish damage events that are part of a frame from damage events
that occur for unrelated reasons.

> >> At some point I would like to add a second GL extension that allows
> >> triggering the X Sync Fence objects from GL so this could be done
> >> properly, but it's low priority given the above.  I omitted it from the
> >> initial spec because it's much harder to implement efficiently.
> >
> > I guess the question is the migration path - how do you go from a world
> > where a GL app can do anything it wants to damage the front buffer, to a
> > world where apps can optionally pass fence information to the compositor
> > and get useful performance improvements from that.
> 
> Right, and I've not fully worked that out.  One idea I had been toying 
> with was adding a X fence sync argument to the existing glXSwapBuffers() 
> commands, and having those versions bypass our current delayed damage 
> event generation logic (The stuff I mentioned above that delays the 
> damage event generation until the GL rendering completes).  That sounds 
> like it would fit in well with your proposals.  What do you think? 
> Clients using this swap could set some other property on their windows 
> or something if needed, to differentiate themselves from "legacy" clients.

Looks like it works to me. If damage events are generated at the time
you do glXSwapBuffers() - where by "at the time", I mean with the same
ordering as if the client did XSendevent() - then the compositor will
just need to wait on the fence to be sure to pick up the rendering.

The only additional thing here is that  if you are using GL for
"toolkit" type stuff like Clutter, you really want to be able to do
sub-window updates. There are two ways I know to do sub-window updates:

 * Copy partial areas from the back buffer to the front buffer. This
   would require a matching fenced version of something along the lines
   of glXCopySubBuffer() or glBlitFramebuffer().

 * Be able to know what areas of the frame buffer are preserved on
   buffer swap. GLX_OML_swap_method isn't useful for the open source
   drivers since there is no static answer and thus the result has
   to be SWAP_UNDEFINED.

Setting a property isn't required - because this just makes GLX clients
behave like other normal X clients. The requirement is instead that 
direct rendering clients using the current NVIDIA drivers *without* this
extension must not use _NET_WM_SYNC_DRAWN and related protocols.

Or if glFinish() works - then they have to glFinish() before they update
the frame counter.

> >> One problem I see with your spec:  When are the fence sync objects
> >> reset?  In the current form, it will only run properly for "L"
> >> iterations.  I've found this non-trivial to solve efficiently in my code
> >> (but very doable).  The solution you'll need is slightly different,
> >> since the consumer and producer of fence sync triggers are separate
> >> applications.
> >
> > You are certainly correct this is something that needs to be discussed
> > in the specification.
> >
> > If I read the sync spec correctly, one easy answer is that the client
> > just does:
> >
> >   XsyncAwaitFence()
> >   XSyncResetFence()
> >   XSyncTriggerFece()
> >   <update counter>
> >
> > right in a row when it finishes a frame. When the window manager sees
> > the updated counter via an AlarmNotify event, the ResetFence is
> > guaranteed to have been handled by the X server, since ResetFence is
> > done "immediately". The client can potentially omit the AwaitFence if
> > it's already gotten a _NET_WM_SYNC_DRAWN for the previous usage of the
> > fence back from the window manager. I don't have enough of an idea about
> > implemention of fences to know whether that's worthwhile optimization or
> > not - what is the expected cost of waiting for an already triggered fence.
> 
> This gets hard to think about.  My concern with the above is ensuring 
> the rendering submitted by the composite manager that was queued behind 
> a wait for that fence has completed before the reset operation in your 
> code above.  If I'm reading things correctly, a composite manager would 
> do something like:
> 
> /* Wait for software notification that a frame is complete, and get a 
> fence sync that indicates the HW has finished rendering said frame */
> waitEventsAndCounters(&glFenceSyncOut, &windowObjOut);
> 
> /* Wait for the fence sync on the GPU to ensure the frame is complete 
> there before doing our compositing */
> glWaitSync(glFenceSync);
> 
> /* Composite the new frame with OpenGL */
> paintGL();
> 
> /* Send _NET_WM_SYNC_DRAWN */
> notifyFrameHandled(windowObjOut);
> 
> The concern is that the SW-based notification done in the final step 
> above reaches the client app and the app acts on it before the GPU-based 
> sync wait completes.  Since there's a queue of fences in use, this means 
> it would actually need to have looped through the whole queue, so the 
> race condition is pretty unlikely, but very possible on fast GPUs and 
> busy GPUs.

Hmm, you are right that waiting for _NET_WM_SYNC_DRAWN doesn't imply
that the fence is in the triggered state - but other than that
optimization not working, I don't think there is a big problem here:

The point of the XSyncAwaitFence() isn't to throttle rendering - it's to
make sure that we don't BadMatch on a fence that isn't yet in the
triggered state.

If we don't have throttling, yes, we could loop around and reuse a fence
before the compositor has finished waiting for it - whether we call
XSyncAwaitFence or not - but that's "mostly harmless" - because the
application is immediately triggering the fence again after resetting
it, the compositor doesn't lock up or display stale content - it just
ends up waiting a little longer than it was supposed to and getting a
newer frame.

But the normal case is throttled. If we have a queue of 2 fences, then
the race condition occurs if:

 Application starts and finishes emitting drawing commands for frame
 N + 2

Before:

 GPU starts processing drawing the compositors commands for frame N

Which can never happen if we have working throttling mechanisms - the
compositor shouldn't be emitting frames faster than the GPU can draw
them, the application shouldn't be drawing more frames than the
compositor can draw. 

- Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]