strlen(), and then, a case for glib? (was: Re: glib strlen ?)



On Tue, Mar 25, 2003 at 04:36:34PM -0000, martyn 2 russell bt com wrote:
Also, for strings that are more than a few Kb long, functions like
strlen() are *extremely* defficient not only in terms of run time,
but because they may trash numerous memory cache lines for no
reason at all.
if they _MAY_ trash _NUMEROUS_ memory cache lines for _NO REASON_ at all...
eh?

Just in case anybody read through all of this and still wants to get
extra value from this discussion, I'll see if I can go into more
detail. I don't think of myself as an expert, although I have done a
lot of the research.

Most real implementations of strlen() use hand-crafted assembly that
is implemented 'best' for the target CPU. It is possible to instruct
some CPU's to read data directly from memory without trampling cache
lines. I have never seen an implementation that does this, however, it
is possible. For the Intel P4, with a L2 cache size of 256Kbytes or
so, most implementations of strlen() would entirely trample the
256Kbytes for a string of size >256Kbytes.

Sometimes trampling is good, sometimes it isn't If the string fits in
cache along with the other data, then loading it into cache as part of
the strlen() may be useful, as it may be likely that the string will
be accessed shortly thereafter. There is a lot of 'may' when it comes
to generic routines that are used for many different purposes. In any
case, keeping a header on the string that indicates the length of the
string is not that expensive since most implementations of malloc()
only return sections with a size evenly divisible by 8, and that have
an 8 bytes header kept anyways. So even 'char *s = strdup("a")' likely
has 's' reserving 16+ bytes of memory. Using 4 of these to remember that
the string is of length 1 isn't that expensive.

Now, I believe GString is a little sub-optimal in that it keeps an
additional 4 bytes to refer to the allocated size of the string, and
another 4 bytes to indirectly refer to the string, however, this
overhead is to some degree eliminated by the fact that GString objects
are allocated using GMemChunk, and not malloc(). Also, somebody could
improve GString a few days from now, and anybody currently using
GString would immediately benefit likely without having to change
their own code (although they might need to recompile).

When it comes right down to it, the real argument is that
strlen(char*) is O(n) and often requires the CPU to stall, since
strlen() is usually implemented using recursive branching (like most
loops with indeterminate length that step forwards only a byte each
time), and the CPU can never predict far enough ahead, whereas
(GString*)p->len is O(1) and requires only 1 or 2 total instructions
to complete. strlen() is a lot more difficult to inline, and requires
more registers (the Intel IA-32 being particularly register
starved). GString can be used to store '\0' characters. If GString
needs work, it needs users to demand features from it, so that people
know what features to implement.

The gtk-devel people need to be motivated. If only a few people use
GString, nobody will bother to enhance it. I'm of the opinion that if
one has chosen the route of loading a big library like GLIB, GTK, or
GNOME, than one might as well make maximum use of the library. Using
GLIB for only one or two things is probably an invalid way of thinking
as one is being hit by the cost of GLIB (for example, even one use of
GString will pull in all the GMemChunk, and probably GList, etc.)
without really seeing any of the benefits (improved portability, ...).

I assume that most people here are GLIB/GTK+ advocates... We as a
community have created the libraries. Let us not be embarassed or
hesitant to use our creation, to prove that it is actually valuable,
and not just a fancy whistle. For those of us who have done less
implementation contribution, and more user contribution, this is the
single most effective way of promoting GLIB/GTK+ to our peers and
other prospective users.

mark

-- 
mark mielke cc/markm ncf ca/markm nortelnetworks com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]