Re: How to build a program that can read the content of a very simple MS Excel file?



On 17/06/2007 2:27 AM, Huge Mountain wrote:
Thank all you!

I still have another question:
I'm trying to get the binary format of MS Excel file with this book, excelfileformat.pdf from http://sc.openoffice.org/excelfileformat.pdf <http://sc.openoffice.org/excelfileformat.pdf>

As previously advised, you also need to read compdocfileformat.pdf -- see below.

In part 2.2.2, page 11, we have:
The following table lists names of possible streams.
Stream name :
Book: BIFF5/BIFF7workbook stream (➜5.1.3) Workbook: BIFF8 workbook stream (➜5.1.3)
<05H>SummaryInformation:                  Document settings
<05H>DocumentSummaryInformation:   Document settings
Ctls: Formatting of form controls User Names User names in shared workbooks (➜10) Revision Log Change tracking log stream (➜10)

I just care about BIFF8/8X. Is this all streams that exist in BIFF8, and no more stream used?

There can be streams containing macros etc.

(In fact that my excel file, which I told before, only have 3 streams: Workbook, <05H>SummaryInformation, <05H>DocumentSummaryInformation). Is all content of a excel file (content of cells) stored in Workbook stream?

All cell content is in the Workbook stream, except of course when an external reference is made (to cells in another file).

Is length of Workbook stream unlimited

A worksheet can contain up to 256 columns and 65536 rows, and I am unaware of any limit on the number of worksheets. My pathology museum includes a 120 Mb program-created file and a 40 Mb manually-created file. A practical limit would be imposed by the amount of memory required to process the file.

and the usual value, 4096 bytes, just a min length of Workbook stream?

There is no minimum size other than as dictated by the required contents. 4096 is the usual value for the minimum size of a *standard* stream. Streams smaller than that are either (a) included in the Short-Stream Container Stream ("SSCS") or (b) zero-padded to 4096 bytes and written as a standard stream. A reader must compare the stream's size (from its directory entry) with the minimum size of a standard stream and act accordingly. I have just created a small XLS file using Gnumeric 1.7.6 (it has 'x' in cell A1 of the sole worksheet); the size of the Workbook stream is 2057 bytes and this is included in the SSCS (as are the two .*SummaryInformation streams).

Please help me to know more clearly!

I am very curious about the direction that you are taking -- you appear to want to write an XLS reader yourself, based on an extremely cursory reading of some of the available documentation and a sample of one tiny XLS file. You have been pointed at existing working implementations in C, Python, perl and Java, but appear not to be interested in those, not even reading their source code. Must you re-invent the wheel? What language are you going to use? If C++, consider (1) using the Gnumeric C source (2) digging in the source of OpenOffice.org's Calc. You'd no doubt find implementations in PHP, Delphi, etc if you google hard enough.

Using existing implementations can be trivially easy; here I demonstrate digging that 'x' out of the tiny Gnumeric-created XLS file using Python interactively:

| >>> import xlrd
| >>> book = xlrd.open_workbook('c:/excel_misc/1cell_gnu.xls')
| >>> sheet = book.sheet_by_index(0)
| >>> sheet.name
| u'Sheet1'
The u means Unicode.
| >>> sheet.ncols
| 1
| >>> sheet.nrows
| 1
| >>> sheet.row_values(0)
| [u'x']
| >>> sheet.row_values(0)[0]
| u'x'


HTH,
John



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]