Re: Gnumeric parsing efficiency
- From: John Machin <sjmachin lexicon net>
- To: Kenneth Dakin <kennethdakin yahoo co uk>
- Cc: gnumeric list <gnumeric-list gnome org>
- Subject: Re: Gnumeric parsing efficiency
- Date: Wed, 18 Nov 2009 08:33:23 +1100
On 16/11/2009 11:13 PM, Kenneth Dakin wrote:
As a dedicated program efficiency enthusiast, I take exception to
examples of code that could be improved significantly with almost no
effort. I am 'of the school' that believes wasting CPU cycles is not at
all clever when they can be used for some other processing in the same
program or to simply reduce the need for more or faster processors.
For quite some time now, I have noticed that the Switch statement is
used extensively in Gnumeric code (and presumably in other Gnome
software). This is fine - up to a point.
The prevailing view of high level language programmers
I don't see the relevance of the view of HLL programmers on C code :-)
is that the
switch statement, as well as being a concise method of multiway
branching, is very efficient and usually optimized by the C compiler.
This however is only true for 'small' ranges of values (eg 0-9 etc with
few gaps).
Based on observation by whom of what versions of which C compilers, when?
By looking at various examples of distributed Gnumeric source code, I
noticed that it frequently 'parses' for combinations of characters such
as " *,/,+,-, (,)" etc. using Switch.
This would be in compiling formulas, I presume. What proportion of
formula compilation time do you imagine is taken up by switch
statements? How many Gnumeric users write in complaining about the time
it takes?
I believe that the C compiler will generally simply produce a long
sequence of compare and branch instructions for these cases (optimized
or not). I am not familiar with the Gnumeric C compiler and maybe it
takes these as special cases to always generate a 'branch table'
regardless of the apparent gaps - or maybe not!
What is "the" Gnumeric C compiler? In most cases the distribution is in
source form, or in a binary for a particular platform. The compiler in
any one case is likely to be gcc ...
As a programmer with around 40 years experience, mainly in
test/debugging and performance tools, I can assure you that it is
entirely possible (and extremely easy) to mechanically apply a simple
but effective global source translation to all the Gnumeric code to
optimize the switch statements to operate in constant time -
irrespective of the number of characters tested for in the switch
statement. This can be accomplished in around 5 machine instructions (or
about the same or less in C statements, I estimate).
Does this technique have a name and a URL or two? Is it patented or
otherwise proprietary?
Because of the mechanical nature of these changes and the sheer
simplicity of the transformation, no new bugs can be introduced by
manual coding errors, the resultant code will be, on average (assuming
random spread of input values), up to 41 times faster and the binary
size also reduced significantly.
And this technique has not been implemented in compilers?
In practice, a smaller improvement than this will be nevertheless
achievable but likely still very worthwhile.
To test this suggestion beforehand is extremely easy.
1. Simply code up a stand-alone switch statement with say 10-20
different input values and execute it say 10,000,000 times while timing
the test.
2. Run the translation (in this case manually change the switch
statement with a few lines of my supplied code!) and
3. test again.
Simple !
Please let me know if :-
a) your gnumeric compiler always produces an efficient branch table in
these circumstances (and therefore you are not interested!) or
b) you are interested in my suggestion and you will email a sample
switch statement by return for me to make the simple transformation and
send back for you to try out yourselves.
This is starting to sound like an advertisement for a get-rich-quick scam.
c) you are not interested in improving efficiency, especially from
outsiders!
There's a book you might like to read: "How to win friends and influence
people".
d) you think it may impact existing code reliability (I can convince you
of the opposite)
e) you are too busy, however much better the code will become
(f) estimated low bangs-per-buck ratio
(P.S. I have had limited experience with gnumeric when, about 24 months
ago) I discovered a very significant and undiscovered basic bug (a
cut/paste error) while using Editgrid and afterwards got it fixed pretty
quickly by the Editgrid/gnumeric team).
ERROR -- Unbalanced parentheses in above sentence.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]