Re: Excel Test Files

On 8/02/2007 3:32 AM, Morten Welinder wrote:
There is a small amount of MS Excel test files at Obviously, they are
heavily filtered in favor of Softmaker Office.

We are aware of these.  Note, that last we looked at them, the only reason that
Softmaker did well was that they cheated: Softmaker used the precomputed
values in the files and was unable to compute those itself.


Please let me introduce myself: As well as being a user of Gnumeric, I'm also the author/maintainer of xlrd, a Python package for programatically extracting data from XLS files.

In addition to what Morten wrote: those files appear to have been written by Softmaker, not by Excel. They are also a test of how tolerant XLS readers are when faced with XLS files that *don't* quite match what Excel would write.

1. Four of the files contain PALETTE records; one has 51 colours and the other 3 have 32 colours each. Excel always writes these with a full complement of 56 colours, even if the user has changed the definition of only one colour index. I'm left hoping that the high-end colour indexes are either not used in those files, or that they are meant to map to the same RGB values as Excel's defaults. I'd prefer to know, not to hope.

2. The array_pm06.xls file uses 0x0021 (last seen in Excel 2.x!) as the record type for its ARRAY records instead of the expected 0x0221. Fortunately the contents of the records follow the modern layout.

3. Two of the files (chart.xls and surface...) have an inconsistency at the OLE2 compound document level: they say in the internal directory that the length of what I (following the documentation) call the "Short-Stream Container Stream" is 16384 bytes, but the actual contents are 11264 and 9728 bytes respectively. This discrepancy doesn't cause xlrd any problem extracting the Workbook stream, but doesn't inspire confidence that extracting other streams contained in the S-SCS would not have a problem.

Thanks to Peter for the pointer to the test files; I'm adding them to the other specimens in my pathology museum :-)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]