Re: Beagle roadmap.



On Tue, 2004-10-05 at 10:43 -0400, Christopher James Lahey wrote:
> For what it's worth, I've got half of a PPT parser working.  This
> includes some of the code that will be needed for the rest of the MS
> Office stuff.  I'll commit what I have as soon as I get beagle to
> compile.  Then I'll be working on the shared MS Office code.  If
> someone
> wants to work on the Word specific or Excel specific parts, let me
> know.  You'll subclass BeagleFilterOle and implement Pull, but not
> PullMetadata or Open.  That will all be in BeagleFilterOle.
> 

BeagleFilterOle would be fine, as I am currently working on Word
specific stuff (parsing the FIB).  

My current code is based on gsf-sharp.  I extract the basic streams like
"WordDocument", "DocumentSummaryInformation", "SummaryInformation",
"1Table", "Data", "ObjectPool" etc and parse the "WordDocument" stream
according to FIB structure to extract the contents of the document.  It
works fine, however, i will have to implement the PAPX,CHX, SSHT etc
related to the formatting of a character, paragraph and style sheets.

So, if I understand properly, the BeagleFilterOle will take care of:
	
	1)  parsing "DocumentSummaryInformation" <== DoPullProperties()
	2)  parsing "SummaryInformation"	 <== DoPullProperties()
	3)  fetching "WordDocument"
	4)  fetching "1Table" 
	5)  fetching "Data"
	6)  fetching "WorkBook" or "Book"
	7)  fetching ppt specific stuffs (actually i was looking at doc and xls
only ;) )

And specific filters have to implement only the specific parsers, isn't
it?

I have couple of questions on BeagleFilterOle:

1)  Is it based on gsf-sharp?
2)  Are there any plans to support non-ole encapsulation, like, "escher
de-encapsulation"? (i really don't know what the heck is that
encapsulation.. but when i was browsing through OO.o filters code, i
found some comments saying that)
3)  When will the BeagleFilterOle be ready??

Meanwhile, i will just concentrate on writing word/excel specific
filter.

Thanks,

V. Varadhan.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]