[glib/fix-gnulib-msvc-isnan: 16/37] gthread: Use C11-style memory consistency to speed up g_once()

From: Chun-wei Fan <fanchunwei src gnome org>
To: commits-list gnome org
Cc:
Subject: [glib/fix-gnulib-msvc-isnan: 16/37] gthread: Use C11-style memory consistency to speed up g_once()
Date: Tue, 9 Jun 2020 10:22:30 +0000 (UTC)
commit 44ed91b3ad761725e3ae9eef159af0c288929262
Author: Philip Withnall <withnall endlessm com>
Date:   Thu Feb 13 16:31:24 2020 +0000

    gthread: Use C11-style memory consistency to speed up g_once()
    
    The g_once() function exists to call a callback function exactly once,
    and to block multiple contending threads on its completion, then to
    return its return value to all of them (so they all see the same value).
    
    The full implementation of g_once() (in g_once_impl()) uses a mutex and
    condition variable to achieve this, and is needed in the contended case,
    where multiple threads need to be blocked on completion of the callback.
    
    However, most of the times that g_once() is called, the callback will
    already have been called, and it just needs to establish that it has
    been called and to return the stored return value.
    
    Previously, a fast path was used if we knew that memory barriers were
    not needed on the current architecture to safely access two dependent
    global variables in the presence of multi-threaded access. This is true
    of all sequentially consistent architectures.
    
    Checking whether we could use this fast path (if
    `G_ATOMIC_OP_MEMORY_BARRIER_NEEDED` was *not* defined) was a bit of a
    pain, though, as it required GLib to know the memory consistency model
    of every architecture. This kind of knowledge is traditionally a
    compiler’s domain.
    
    So, simplify the fast path by using the compiler-provided atomic
    intrinsics, and acquire-release memory consistency semantics, if they
    are available. If they’re not available, fall back to always locking as
    before.
    
    We definitely need to use `__ATOMIC_ACQUIRE` in the macro implementation
    of g_once(). We don’t actually need to make the `__ATOMIC_RELEASE`
    changes in `gthread.c` though, since locking and unlocking a mutex
    guarantees to insert a full compiler and hardware memory barrier
    (enforcing sequential consistency). So the `__ATOMIC_RELEASE` changes
    are only in there to make it obvious what stores are logically meant to
    match up with the `__ATOMIC_ACQUIRE` loads in `gthread.h`.
    
    Notably, only the second store (and the first load) has to be atomic.
    i.e. When storing `once->retval` and `once->status`, the first store is
    normal and the second is atomic. This is because the writes have a
    happens-before relationship, and all (atomic or non-atomic) writes
    which happen-before an atomic store/release are visible in the thread
    doing an atomic load/acquire on the same atomic variable, once that load
    is complete.
    
    References:
     * https://preshing.com/20120913/acquire-and-release-semantics/
     * https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/_005f_005fatomic-Builtins.html
     * https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
     * https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering
    
    Signed-off-by: Philip Withnall <withnall endlessm com>
    
    Fixes: #1323

 glib/gthread.c | 14 +++++++++++++-
 glib/gthread.h | 19 ++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)
---
diff --git a/glib/gthread.c b/glib/gthread.c
index cab0e8cbb..34f9c21b2 100644
--- a/glib/gthread.c
+++ b/glib/gthread.c
@@ -630,13 +630,25 @@ g_once_impl (GOnce       *once,
 
   if (once->status != G_ONCE_STATUS_READY)
     {
+      gpointer retval;
+
       once->status = G_ONCE_STATUS_PROGRESS;
       g_mutex_unlock (&g_once_mutex);
 
-      once->retval = func (arg);
+      retval = func (arg);
 
       g_mutex_lock (&g_once_mutex);
+/* We prefer the new C11-style atomic extension of GCC if available. If not,
+ * fall back to always locking. */
+#if defined(G_ATOMIC_LOCK_FREE) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) && defined(__ATOMIC_SEQ_CST)
+      /* Only the second store needs to be atomic, as the two writes are related
+       * by a happens-before relationship here. */
+      once->retval = retval;
+      __atomic_store_n (&once->status, G_ONCE_STATUS_READY, __ATOMIC_RELEASE);
+#else
+      once->retval = retval;
       once->status = G_ONCE_STATUS_READY;
+#endif
       g_cond_broadcast (&g_once_cond);
     }
 
diff --git a/glib/gthread.h b/glib/gthread.h
index 96f536916..a30815eb8 100644
--- a/glib/gthread.h
+++ b/glib/gthread.h
@@ -234,14 +234,23 @@ GLIB_AVAILABLE_IN_ALL
 void            g_once_init_leave               (volatile void  *location,
                                                  gsize           result);
 
-#ifdef G_ATOMIC_OP_MEMORY_BARRIER_NEEDED
-# define g_once(once, func, arg) g_once_impl ((once), (func), (arg))
-#else /* !G_ATOMIC_OP_MEMORY_BARRIER_NEEDED*/
+/* Use C11-style atomic extensions to check the fast path for status=ready. If
+ * they are not available, fall back to using a mutex and condition variable in
+ * g_once_impl().
+ *
+ * On the C11-style codepath, only the load of once->status needs to be atomic,
+ * as the writes to it and once->retval in g_once_impl() are related by a
+ * happens-before relation. Release-acquire semantics are defined such that any
+ * atomic/non-atomic write which happens-before a store/release is guaranteed to
+ * be seen by the load/acquire of the same atomic variable. */
+#if defined(G_ATOMIC_LOCK_FREE) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) && defined(__ATOMIC_SEQ_CST)
 # define g_once(once, func, arg) \
-  (((once)->status == G_ONCE_STATUS_READY) ? \
+  ((__atomic_load_n (&(once)->status, __ATOMIC_ACQUIRE) == G_ONCE_STATUS_READY) ? \
    (once)->retval : \
    g_once_impl ((once), (func), (arg)))
-#endif /* G_ATOMIC_OP_MEMORY_BARRIER_NEEDED */
+#else
+# define g_once(once, func, arg) g_once_impl ((once), (func), (arg))
+#endif
 
 #ifdef __GNUC__
 # define g_once_init_enter(location) \
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]