Horrible PropertySet Format API reads things like Document Summary info from
OLE 2 Compound document files.
Processes streams in the Horrible Property Set Format (HPSF) in POI
filesystems. Microsoft Office documents, i.e. POI filesystems, usually
contain meta data like author, title, last editing date etc. These items
are called
properties and stored in
property set streams along with the document itself. These
streams are commonly named
\005SummaryInformation and
\005DocumentSummaryInformation. However, a POI filesystem may
contain further property sets of other names or types.
In order to extract the properties from a POI filesystem, a property set
stream's contents must be parsed into a
Poi.Net.HPSF.PropertySet} instance. Its subclasses
Poi.Net.HPSF.SummaryInformation} and
Poi.Net.HPSF.DocumentSummaryInformation} deal with the well-known
property set streams
\005SummaryInformation and
\005DocumentSummaryInformation. (However, the streams' names are
irrelevant. What counts is the property set's first section's format ID -
see below.)
The factory method Poi.Net.HPSF.PropertySetFactory#create}
creates a Poi.Net.HPSF.PropertySet} instance. This method
always returns the
most specific property set: If it
identifies the stream data as a Summary Information or as a Document
Summary Information it returns an instance of the corresponding class, else
the general Poi.Net.HPSF.PropertySet}.
A Poi.Net.HPSF.PropertySet} contains a list of
Poi.Net.HPSF.Section}s which can be retrieved with
Poi.Net.HPSF.PropertySet#getSections}. Each
Poi.Net.HPSF.Section} contains a
Poi.Net.HPSF.Property} array which can be retrieved with
Poi.Net.HPSF.Section#getProperties}. Since the vast majority of
Poi.Net.HPSF.PropertySet}s contains only a single
Poi.Net.HPSF.Section}, the convenience method
Poi.Net.HPSF.PropertySet#getProperties} returns the properties of a
Poi.Net.HPSF.PropertySet}'s
Poi.Net.HPSF.Section} (throwing a
Poi.Net.HPSF.NoSingleSectionException} if the
Poi.Net.HPSF.PropertySet} contains more (or less) than exactly one
Poi.Net.HPSF.Section}).
Each Poi.Net.HPSF.Property} has an
ID, a
type, and a
value which can be retrieved
with Poi.Net.HPSF.Property#getID},
Poi.Net.HPSF.Property#getType}, and
Poi.Net.HPSF.Property#GetValue}, respectively. The value's class
depends on the property's type. The current implementation
does not yet support all property types and restricts the values' classes
to java.lang.String}, java.lang.Integer} and
java.util.Date}. A value of a yet unknown type is returned as a byte array
containing the value's origin bytes from the property set stream.
To retrieve the value of a specific Poi.Net.HPSF.Property},
use Poi.Net.HPSF.Section#getProperty} or
Poi.Net.HPSF.Section#getPropertyIntValue}.
The Poi.Net.HPSF.SummaryInformation} and
Poi.Net.HPSF.DocumentSummaryInformation} classes provide convenience
methods for retrieving well-known properties. For example, an application
that wants to retrieve a document's title string just calls
Poi.Net.HPSF.SummaryInformation#getTitle} instead of going through
the hassle of first finding out what the title's property ID is and then
using this ID to get the property's value.
To Do
The following is still left to be implemented:
-
Property dictionaries
-
Writing property sets
-
Codepage support
-
Property type Unicode string
-
Further property types
@author Rainer Klute (klute@rainer-klute.de)
@version $Id: package.html,v 1.3.8.1 2004/02/28 12:55:57 glens Exp $
@since 2002-02-09