Gnumeric parsing efficiency

From: Kenneth Dakin <kennethdakin yahoo co uk>
To: Jody Goldberg <gnumeric-list gnome org>
Subject: Gnumeric parsing efficiency
Date: Mon, 16 Nov 2009 12:13:03 +0000 (GMT)

As a dedicated program efficiency enthusiast, I take exception to examples of code that could be improved significantly with almost no effort. I am 'of the school' that believes wasting CPU cycles is not at all clever when they can be used for some other processing in the same program or to simply reduce the need for more or faster processors.

For quite some time now, I have noticed that the Switch statement is used extensively in Gnumeric code (and presumably in other Gnome software). This is fine - up to a point.

The prevailing view of high level language programmers is that the switch statement, as well as being a concise method of multiway branching, is very efficient and usually optimized by the C compiler. This however is only true for 'small' ranges of values (eg 0-9 etc with few gaps).
By looking at various examples of distributed Gnumeric source code, I noticed that it frequently 'parses' for combinations of characters such as " *,/,+,-, (,)" etc. using Switch.

I believe that the C compiler will generally simply produce a long sequence of compare and branch instructions for these cases (optimized or not). I am not familiar with the Gnumeric C compiler and maybe it takes these as special cases to always generate a 'branch table' regardless of the apparent gaps - or maybe not!

As a programmer with around 40 years experience, mainly in test/debugging and performance tools, I can assure you that it is entirely possible (and extremely easy) to mechanically apply a simple but effective global source translation to all the Gnumeric code to optimize the switch statements to operate in constant time - irrespective of the number of characters tested for in the switch statement. This can be accomplished in around 5 machine instructions (or about the same or less in C statements, I estimate).

Because of the mechanical nature of these changes and the sheer simplicity of the transformation, no new bugs can be introduced by manual coding errors, the resultant code will be, on average (assuming random spread of input values), up to 41 times faster and the binary size also reduced significantly.

In practice, a smaller improvement than this will be nevertheless achievable but likely still very worthwhile.

To test this suggestion beforehand is extremely easy.

1. Simply code up a stand-alone switch statement with say 10-20 different input values and execute it say 10,000,000 times while timing the test.
2. Run the translation (in this case manually change the switch statement with a few lines of my supplied code!) and
3. test again.

Simple !

Please let me know if :-

a) your gnumeric compiler always produces an efficient branch table in these circumstances (and therefore you are not interested!) or
b) you are interested in my suggestion and you will email a sample switch statement by return for me to make the simple transformation and send back for you to try out yourselves.
c) you are not interested in improving efficiency, especially from outsiders!
d) you think it may impact existing code reliability (I can convince you of the opposite)
e) you are too busy, however much better the code will become

(P.S. I have had limited experience with gnumeric when, about 24 months ago) I discovered a very significant and undiscovered basic bug (a cut/paste error) while using Editgrid and afterwards got it fixed pretty quickly by the Editgrid/gnumeric team).

Best Regards
Ken Dakin

Follow-Ups:
- Re: Gnumeric parsing efficiency
  - From: Morten Welinder
- Re: Gnumeric parsing efficiency
  - From: John Machin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]