[mutter] clutter: Compute max render time heuristically



commit 592fbee065f3c90c105559bcd80a05da3c78434e
Author: Ivan Molodetskikh <yalterz gmail com>
Date:   Fri Nov 27 20:58:55 2020 +0300

    clutter: Compute max render time heuristically
    
    Max render time shows how early the frame clock needs to be dispatched
    to make it to the predicted next presentation time. Before this commit
    it was set to refresh interval minus 2 ms. This means Mutter would
    always start compositing 14.7 ms before a display refresh on a 60 Hz
    screen or 4.9 ms before a display refresh on a 144 Hz screen. However,
    Mutter frequently does not need as much time to finish compositing and
    submit buffer to KMS:
    
          max render time
          /------------\
    ---|---------------|---------------|---> presentations
          D----S          D--S
    
          D - frame clock dispatch
          S - buffer submission
    
    This commit aims to automatically compute a shorter max render time to
    make Mutter start compositing as late as possible (but still making it
    in time for the presentation):
    
             max render time
                 /-----\
    ---|---------------|---------------|---> presentations
                 D----S          D--S
    
    Why is this better? First of all, Mutter gets application contents to
    draw at the time when compositing starts. If new application buffer
    arrives after the compositing has started, but before the next
    presentation, it won't make it on screen:
    
    ---|---------------|---------------|---> presentations
          D----S          D--S
            A-------------X----------->
    
                       ^ doesn't make it for this presentation
    
            A - application buffer commit
            X - application buffer sampled by Mutter
    
    Here the application committed just a few ms too late and didn't make on
    screen until the next presentation. If compositing starts later in the
    frame cycle, applications can commit buffers closer to the presentation.
    These buffers will be more up-to-date thereby reducing input latency.
    
    ---|---------------|---------------|---> presentations
                 D----S          D--S
            A----X---->
    
                       ^ made it!
    
    Moreover, applications are recommended to render their frames on frame
    callbacks, which Mutter sends right after compositing is done. Since
    this commit delays the compositing, it also reduces the latency for
    applications drawing on frame callbacks. Compare:
    
    ---|---------------|---------------|---> presentations
          D----S          D--S
               F--A-------X----------->
                  \____________________/
                         latency
    
    ---|---------------|---------------|---> presentations
                 D----S          D--S
                      F--A-------X---->
                         \_____________/
                          less latency
    
               F - frame callback received, application starts rendering
    
    So how do we actually estimate max render time? We want it to be as low
    as possible, but still large enough so as not to miss any frames by
    accident:
    
             max render time
                 /-----\
    ---|---------------|---------------|---> presentations
                 D------S------------->
                       oops, took a little too long
    
    For a successful presentation, the frame needs to be submitted to KMS
    and the GPU work must be completed before the vblank. This deadline can
    be computed by subtracting the vblank duration (calculated from display
    mode) from the predicted next presentation time.
    
    We don't know how long compositing will take, and we also don't know how
    long the GPU work will take, since clients can submit buffers with
    unfinished GPU work. So we measure and estimate these values.
    
    The frame clock dispatch can be split into two phases:
    1. From start of the dispatch to all GPU commands being submitted (but
       not finished)—until the call to eglSwapBuffers().
    2. From eglSwapBuffers() to submitting the buffer to KMS and to GPU
       work completing. These happen in parallel, and we want the latest of
       the two to be done before the vblank.
    
    We measure these three durations and store them for the last 16 frames.
    The estimate for each duration is a maximum of these last 16 durations.
    Usually even taking just the last frame's durations as the estimates
    works well enough, but I found that screen-capturing with OBS Studio
    increases duration variability enough to cause frequent missed frames
    when using that method. Taking a maximum of the last 16 frames smoothes
    out this variability.
    
    The durations are naturally quite variable and the estimates aren't
    perfect. To take this into account, an additional constant 2 ms is added
    to the max render time.
    
    How does it perform in practice? On my desktop with 144 Hz monitors I
    get a max render time of 4–5 ms instead of the default 4.9 ms (I had
    1 ms manually configured in sway) and on my laptop with a 60 Hz screen I
    get a max render time of 4.8–5.5 ms instead of the default 14.7 ms (I
    had 5–6 ms manually configured in sway). Weston [1] went with a 7 ms
    default.
    
    The main downside is that if there's a sudden heavy batch of work in the
    compositing, which would've made it in default 14.7 ms, but doesn't make
    it in reduced 6 ms, there is a delayed frame which would otherwise not
    be there. Arguably, this happens rarely enough to be a good trade-off
    for reduced latency. One possible solution is a "next frame is expected
    to be heavy" function which manually increases max render time for the
    next frame. This would avoid this single dropped frame at the start of
    complex animations.
    
    [1]: https://www.collabora.com/about-us/blog/2015/02/12/weston-repaint-scheduling/
    
    Part-of: <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1762>

 clutter/clutter/clutter-frame-clock.c | 70 +++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 3 deletions(-)
---
diff --git a/clutter/clutter/clutter-frame-clock.c b/clutter/clutter/clutter-frame-clock.c
index 57040e0752..8289c0af8c 100644
--- a/clutter/clutter/clutter-frame-clock.c
+++ b/clutter/clutter/clutter-frame-clock.c
@@ -44,8 +44,15 @@ typedef struct _EstimateQueue
   int next_index;
 } EstimateQueue;
 
-/* Wait 2ms after vblank before starting to draw next frame */
-#define SYNC_DELAY_US ms2us (2)
+/* When heuristic render time is off,
+ * wait 2ms after vblank before starting to draw next frame.
+ */
+#define SYNC_DELAY_FALLBACK_US ms2us (2)
+
+/* A constant added to heuristic max render time to account for variations
+ * in the estimates.
+ */
+#define MAX_RENDER_TIME_CONSTANT_US ms2us (2)
 
 typedef struct _ClutterFrameListener
 {
@@ -100,6 +107,8 @@ struct _ClutterFrameClock
   EstimateQueue swap_to_rendering_done_us;
   /* Last few durations between buffer swap and KMS submission. */
   EstimateQueue swap_to_flip_us;
+  /* If we got new measurements last frame. */
+  gboolean got_measurements_last_frame;
 
   gboolean pending_reschedule;
   gboolean pending_reschedule_now;
@@ -217,6 +226,8 @@ clutter_frame_clock_notify_presented (ClutterFrameClock *frame_clock,
 {
   frame_clock->last_presentation_time_us = frame_info->presentation_time;
 
+  frame_clock->got_measurements_last_frame = FALSE;
+
   if (frame_info->cpu_time_before_buffer_swap_us != 0 &&
       frame_info->gpu_rendering_duration_ns != 0)
     {
@@ -243,6 +254,8 @@ clutter_frame_clock_notify_presented (ClutterFrameClock *frame_clock,
                                 swap_to_rendering_done_us);
       estimate_queue_add_value (&frame_clock->swap_to_flip_us,
                                 swap_to_flip_us);
+
+      frame_clock->got_measurements_last_frame = TRUE;
     }
 
   if (frame_info->refresh_rate > 1)
@@ -281,6 +294,56 @@ clutter_frame_clock_notify_ready (ClutterFrameClock *frame_clock)
     }
 }
 
+static int64_t
+clutter_frame_clock_compute_max_render_time_us (ClutterFrameClock *frame_clock)
+{
+  int64_t refresh_interval_us;
+  int64_t max_dispatch_to_swap_us = 0;
+  int64_t max_swap_to_rendering_done_us = 0;
+  int64_t max_swap_to_flip_us = 0;
+  int64_t max_render_time_us;
+  int i;
+
+  refresh_interval_us =
+    (int64_t) (0.5 + G_USEC_PER_SEC / frame_clock->refresh_rate);
+
+  if (!frame_clock->got_measurements_last_frame)
+    return refresh_interval_us - SYNC_DELAY_FALLBACK_US;
+
+  for (i = 0; i < ESTIMATE_QUEUE_LENGTH; ++i)
+    {
+      max_dispatch_to_swap_us =
+        MAX (max_dispatch_to_swap_us,
+             frame_clock->dispatch_to_swap_us.values[i]);
+      max_swap_to_rendering_done_us =
+        MAX (max_swap_to_rendering_done_us,
+             frame_clock->swap_to_rendering_done_us.values[i]);
+      max_swap_to_flip_us =
+        MAX (max_swap_to_flip_us,
+             frame_clock->swap_to_flip_us.values[i]);
+    }
+
+  /* Max render time shows how early the frame clock needs to be dispatched
+   * to make it to the predicted next presentation time. It is composed of:
+   * - An estimate of duration from dispatch start to buffer swap.
+   * - Maximum between estimates of duration from buffer swap to GPU rendering
+   *   finish and duration from buffer swap to buffer submission to KMS. This
+   *   is because both of these things need to happen before the vblank, and
+   *   they are done in parallel.
+   * - Duration of the vblank.
+   * - A constant to account for variations in the above estimates.
+   */
+  max_render_time_us =
+    max_dispatch_to_swap_us +
+    MAX (max_swap_to_rendering_done_us, max_swap_to_flip_us) +
+    frame_clock->vblank_duration_us +
+    MAX_RENDER_TIME_CONSTANT_US;
+
+  max_render_time_us = CLAMP (max_render_time_us, 0, refresh_interval_us);
+
+  return max_render_time_us;
+}
+
 static void
 calculate_next_update_time_us (ClutterFrameClock *frame_clock,
                                int64_t           *out_next_update_time_us,
@@ -314,7 +377,8 @@ calculate_next_update_time_us (ClutterFrameClock *frame_clock,
     }
 
   min_render_time_allowed_us = refresh_interval_us / 2;
-  max_render_time_allowed_us = refresh_interval_us - SYNC_DELAY_US;
+  max_render_time_allowed_us =
+    clutter_frame_clock_compute_max_render_time_us (frame_clock);
 
   if (min_render_time_allowed_us > max_render_time_allowed_us)
     min_render_time_allowed_us = max_render_time_allowed_us;


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]