Re: Review of the new ringbuffer

From: Tim Janik <timj gtk org>
To: Stefan Westerfeld <stefan space twc de>
Cc: beast gtk org
Subject: Re: Review of the new ringbuffer
Date: Sat, 27 Jan 2007 02:04:04 +0100 (CET)

On Fri, 26 Jan 2007, Stefan Westerfeld wrote:

On Thu, Jan 25, 2007 at 10:23:44AM +0100, Tim Janik wrote:

On Thu, 25 Jan 2007, Stefan Westerfeld wrote:

|         for (int i = 0; i < space; i++)
|           m_buffer[wm + i] = data[i];

Its better to use std::copy, because for some data types (primitive
types), then memmove will be used (template specialization).


the way the code is written above, GCC can auto-vectorize it,
is that true for std::copy as well?


I think the only way to find out what is faster is a benchmark, and even
then, the benchmark may result in different results, depending on

[...]

What you see is that on my system, in every case std::copy was faster
than a hand written loop. Vectorization does make a difference, but
still the vectorized loop is slower than std::copy in every case.


interesting, the other question that came to my mind about std::copy
is how it correctly handles memcpy vs. copy constructors.
stl_algobase.h tells us that memmove is used if the copy argument
types are scalars and equal. cpp_type_traits.h implements __is_scalar
by matching bool, char, wchar_t, short, int, long, long long, float,
double, pointer. otherwise, an assignment loop similar to the one we
already have is used.
that should eliminate the need for vectorization in most cases.

|         wm = (wm + space) % m_size;
|         data += space;
|         length -= space;
|       }

Systems with memory barriers need a write memory barrier here. In the
jack driver code this looks like:


write/read memory barriers are implemented as part of Atomic::uint_get
and Atomic::uint_set. also, volatile variables haven't been updated
here, so there's no need for a barrier at this point.


Are you sure? Suppose Atomic::uint_set looked like this:


i thought you meant the updating of wm here which is a local variable
and didn't quite make sense to me. reading up in:

http://techweb.rfa.org/pipermail/portaudio/2006-May/005651.html


i get your point though. i've added descriptive functions to birnet
for the barriers now, using them once it's pushed to birnet.git and
merged into beast should make the jack driver more readable as well:

diff --git a/birnet/birnetthread.hh b/birnet/birnetthread.hh
index 48b9cee..061cefd 100644
--- a/birnet/birnetthread.hh
+++ b/birnet/birnetthread.hh
@@ -62,6 +62,8 @@ public:
 };

 namespace Atomic {
+inline void    read_barrier  (void)                                { int dummy; ThreadTable.atomic_int_get (&dummy); }
+inline void    write_barrier (void)                                { int dummy; ThreadTable.atomic_int_set (&dummy, 0); }
 /* atomic integers */
 inline void    int_set       (volatile int  *iptr, int value)      { ThreadTable.atomic_int_set (iptr, value); }
 inline int     int_get       (volatile int  *iptr)                 { return ThreadTable.atomic_int_get (iptr); }

and yes, this is a sufficient barrier implementation, the
functions boil down to g_atomic_int_get/g_atomic_int_set which are
special cased on systems that need barriers to implement a memory
barrier. glib doesn't distinguish between read/write barriers, but
making that distinction in the API should help readability in the
actual code.

thanks again, i'll update the code accordingly.

  Cu... Stefan


---
ciaoTJ

References:
- Review of the new ringbuffer
  - From: Stefan Westerfeld
- Re: Review of the new ringbuffer
  - From: Tim Janik
- Re: Review of the new ringbuffer
  - From: Stefan Westerfeld

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]