Re: How to build a program that can read the content of a very simple MS Excel file?
- From: John Machin <sjmachin lexicon net>
- To: Huge Mountain <educatenter gmail com>
- Cc: dev sc openoffice org, Jody Goldberg <jody gnome org>, dr openoffice org, gnumeric-list gnome org, acoliver apache org
- Subject: Re: How to build a program that can read the content of a very simple MS Excel file?
- Date: Sun, 17 Jun 2007 10:23:32 +1000
On 17/06/2007 2:27 AM, Huge Mountain wrote:
Thank all you!
I still have another question:
I'm trying to get the binary format of MS Excel file with this
book, excelfileformat.pdf from
http://sc.openoffice.org/excelfileformat.pdf
<http://sc.openoffice.org/excelfileformat.pdf>
As previously advised, you also need to read compdocfileformat.pdf --
see below.
In part 2.2.2, page 11, we have:
The following table lists names of possible streams.
Stream name :
Book:
BIFF5/BIFF7workbook stream (➜5.1.3)
Workbook: BIFF8 workbook stream
(➜5.1.3)
<05H>SummaryInformation: Document settings
<05H>DocumentSummaryInformation: Document settings
Ctls: Formatting of
form controls
User Names User names in shared
workbooks (➜10)
Revision Log Change tracking log
stream (➜10)
I just care about BIFF8/8X. Is this all streams that exist in BIFF8, and
no more stream used?
There can be streams containing macros etc.
(In fact that my excel file, which I told before,
only have 3 streams: Workbook, <05H>SummaryInformation,
<05H>DocumentSummaryInformation). Is all content of a excel file
(content of cells) stored in Workbook stream?
All cell content is in the Workbook stream, except of course when an
external reference is made (to cells in another file).
Is length of Workbook
stream unlimited
A worksheet can contain up to 256 columns and 65536 rows, and I am
unaware of any limit on the number of worksheets. My pathology museum
includes a 120 Mb program-created file and a 40 Mb manually-created
file. A practical limit would be imposed by the amount of memory
required to process the file.
and the usual value, 4096 bytes, just a min length of
Workbook stream?
There is no minimum size other than as dictated by the required
contents. 4096 is the usual value for the minimum size of a *standard*
stream. Streams smaller than that are either (a) included in the
Short-Stream Container Stream ("SSCS") or (b) zero-padded to 4096 bytes
and written as a standard stream. A reader must compare the stream's
size (from its directory entry) with the minimum size of a standard
stream and act accordingly. I have just created a small XLS file using
Gnumeric 1.7.6 (it has 'x' in cell A1 of the sole worksheet); the size
of the Workbook stream is 2057 bytes and this is included in the SSCS
(as are the two .*SummaryInformation streams).
Please help me to know more clearly!
I am very curious about the direction that you are taking -- you appear
to want to write an XLS reader yourself, based on an extremely cursory
reading of some of the available documentation and a sample of one tiny
XLS file. You have been pointed at existing working implementations in
C, Python, perl and Java, but appear not to be interested in those, not
even reading their source code. Must you re-invent the wheel? What
language are you going to use? If C++, consider (1) using the Gnumeric C
source (2) digging in the source of OpenOffice.org's Calc. You'd no
doubt find implementations in PHP, Delphi, etc if you google hard enough.
Using existing implementations can be trivially easy; here I demonstrate
digging that 'x' out of the tiny Gnumeric-created XLS file using
Python interactively:
| >>> import xlrd
| >>> book = xlrd.open_workbook('c:/excel_misc/1cell_gnu.xls')
| >>> sheet = book.sheet_by_index(0)
| >>> sheet.name
| u'Sheet1'
The u means Unicode.
| >>> sheet.ncols
| 1
| >>> sheet.nrows
| 1
| >>> sheet.row_values(0)
| [u'x']
| >>> sheet.row_values(0)[0]
| u'x'
HTH,
John
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]