Re: Gnumeric parsing efficiency



On 16/11/2009 11:13 PM, Kenneth Dakin wrote:
As a dedicated program efficiency enthusiast, I take exception to examples of code that could be improved significantly with almost no effort. I am 'of the school' that believes wasting CPU cycles is not at all clever when they can be used for some other processing in the same program or to simply reduce the need for more or faster processors.

For quite some time now, I have noticed that the Switch statement is used extensively in Gnumeric code (and presumably in other Gnome software). This is fine - up to a point.

The prevailing view of high level language programmers

I don't see the relevance of the view of HLL programmers on C code :-)

is that the switch statement, as well as being a concise method of multiway branching, is very efficient and usually optimized by the C compiler. This however is only true for 'small' ranges of values (eg 0-9 etc with few gaps).

Based on observation by whom of what versions of which C compilers, when?

By looking at various examples of distributed Gnumeric source code, I noticed that it frequently 'parses' for combinations of characters such as " *,/,+,-, (,)" etc. using Switch.

This would be in compiling formulas, I presume. What proportion of formula compilation time do you imagine is taken up by switch statements? How many Gnumeric users write in complaining about the time it takes?

I believe that the C compiler will generally simply produce a long sequence of compare and branch instructions for these cases (optimized or not). I am not familiar with the Gnumeric C compiler and maybe it takes these as special cases to always generate a 'branch table' regardless of the apparent gaps - or maybe not!

What is "the" Gnumeric C compiler? In most cases the distribution is in source form, or in a binary for a particular platform. The compiler in any one case is likely to be gcc ...

As a programmer with around 40 years experience, mainly in test/debugging and performance tools, I can assure you that it is entirely possible (and extremely easy) to mechanically apply a simple but effective global source translation to all the Gnumeric code to optimize the switch statements to operate in constant time - irrespective of the number of characters tested for in the switch statement. This can be accomplished in around 5 machine instructions (or about the same or less in C statements, I estimate).

Does this technique have a name and a URL or two? Is it patented or otherwise proprietary?

Because of the mechanical nature of these changes and the sheer simplicity of the transformation, no new bugs can be introduced by manual coding errors, the resultant code will be, on average (assuming random spread of input values), up to 41 times faster and the binary size also reduced significantly.

And this technique has not been implemented in compilers?

In practice, a smaller improvement than this will be nevertheless achievable but likely still very worthwhile.

To test this suggestion beforehand is extremely easy.

1. Simply code up a stand-alone switch statement with say 10-20 different input values and execute it say 10,000,000 times while timing the test. 2. Run the translation (in this case manually change the switch statement with a few lines of my supplied code!) and
3. test again.

Simple !

Please let me know if :-

a) your gnumeric compiler always produces an efficient branch table in these circumstances (and therefore you are not interested!) or b) you are interested in my suggestion and you will email a sample switch statement by return for me to make the simple transformation and send back for you to try out yourselves.

This is starting to sound like an advertisement for a get-rich-quick scam.

c) you are not interested in improving efficiency, especially from outsiders!

There's a book you might like to read: "How to win friends and influence people".

d) you think it may impact existing code reliability (I can convince you of the opposite)
e) you are too busy, however much better the code will become

(f) estimated low bangs-per-buck ratio

(P.S. I have had limited experience with gnumeric when, about 24 months ago) I discovered a very significant and undiscovered basic bug (a cut/paste error) while using Editgrid and afterwards got it fixed pretty quickly by the Editgrid/gnumeric team).

ERROR -- Unbalanced parentheses in above sentence.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]