Re: Excel Test Files

On 8/02/2007 10:43 PM, Martin Kotulla (SoftMaker) wrote:
John Machin schrieb:

Please let me introduce myself: As well as being a user of Gnumeric, I'm also the author/maintainer of xlrd, a Python package for programatically extracting data from XLS files.

In addition to what Morten wrote: those files appear to have been written by Softmaker, not by Excel. They are also a test of how tolerant XLS readers are when faced with XLS files that *don't* quite match what Excel would write. [...]

Thank you for these specific reports. I have forwarded them to our developers. Let's see what they can do to make PlanMaker files more amenable to your routines.

Hi Martin,

Thanks for your concern and your kind offer, but I have already made xlrd more tolerant of the strangnesses in your files :-)

Of course, as you are most certainly aware, there is a wide range of file format variations that Excel has written over the years.
Our files are within that spectrum,

Ah yes, but the idea is supposed to be that you pick on a version and write a file that corresponds with that version; using the BIFF2 record code and the BIFF8 layout for ARRAY records is stretching "within that spectrum" just a little ;-) So is writing 51 or 32 colours in a PALETTE record when Excel writes 16 or 56. What should a reader do about the missing 5 or 24 colour indexes: hope they are not used in the file? map them to the corresponding RGB values in the Excel default palette?

BTW, try opening the array_pm06.xls with Excel 2003 and saving it as some other name. When I did that, Excel didn't write out a PALETTE record, indicating that there were no used (colour index, RGB) combinations that weren't in the default palette -- IOW, the PALETTE record is redundant. Also, compare the contents of the PALETTE record with the BIFF8 default palette -- I could be wrong, but it appeared to me that most of the entries were just the standard palette entries offset by two; this looks much more accidental than intentional i.e. I suspect a bug.

and Excel and have no problem opening them.

They have been at it for a longer time with a much greater volume. I don't imagine that their version 0.1 was so tolerant. I suspect it's just like my experience: some liberally-written file has caused a crash or an assertion to fail, they've inspected the file and decided whether they can ignore the non-conformance or must refuse to open it or may be they can open it, with some kind of warning.

So, use them as yet another test case for how flexible your code must be.

I am, within reason. The antique record code is now accepted without a murmur, a truncated PALETTE record generates a NOTE message, and the file-structure inconsistency generates a WARNING message.

Be liberal in what you accept... :-)

Indeed, and this is necessary only because some folk act as though in blissful ignorance of the second half of that quotation: ... "and conservative in what you send" :-)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]