Re: Beagle roadmap.



On Tue, 2004-10-05 at 23:47, Veerapuram Varadhan wrote:
> On Tue, 2004-10-05 at 10:43 -0400, Christopher James Lahey wrote:
> > For what it's worth, I've got half of a PPT parser working.  This
> > includes some of the code that will be needed for the rest of the MS
> > Office stuff.  I'll commit what I have as soon as I get beagle to
> > compile.  Then I'll be working on the shared MS Office code.  If
> > someone
> > wants to work on the Word specific or Excel specific parts, let me
> > know.  You'll subclass BeagleFilterOle and implement Pull, but not
> > PullMetadata or Open.  That will all be in BeagleFilterOle.
> > 
> 
> BeagleFilterOle would be fine, as I am currently working on Word
> specific stuff (parsing the FIB).  
> 
> My current code is based on gsf-sharp.  I extract the basic streams like
> "WordDocument", "DocumentSummaryInformation", "SummaryInformation",
> "1Table", "Data", "ObjectPool" etc and parse the "WordDocument" stream
> according to FIB structure to extract the contents of the document.  It
> works fine, however, i will have to implement the PAPX,CHX, SSHT etc
> related to the formatting of a character, paragraph and style sheets.

This is perfect.

> So, if I understand properly, the BeagleFilterOle will take care of:
> 	
> 	1)  parsing "DocumentSummaryInformation" <== DoPullProperties()
> 	2)  parsing "SummaryInformation"	 <== DoPullProperties()

Yep, yep.

> 	3)  fetching "WordDocument"
> 	4)  fetching "1Table" 
> 	5)  fetching "Data"
> 	6)  fetching "WorkBook" or "Book"

Well, what I was thinking is that there'll be a GsfInFile (I think it's
called) in the parent class and you could fetch whichever substreams you
need using GetChildByName or whatever it's called.

> 	7)  fetching ppt specific stuffs (actually i was looking at doc and xls
> only ;) )

This'll be in BeagleFilterPPT which will be a subclass of
BeagleFilterOle.

> And specific filters have to implement only the specific parsers, isn't
> it?

Yep.

> I have couple of questions on BeagleFilterOle:
> 
> 1)  Is it based on gsf-sharp?

Yes.

> 2)  Are there any plans to support non-ole encapsulation, like, "escher
> de-encapsulation"? (i really don't know what the heck is that
> encapsulation.. but when i was browsing through OO.o filters code, i
> found some comments saying that)

I've never heard of that, so no plans.  If anyone finds documents that
contain MS document like structures that aren't in Ole, we can deal with
it then.

> 3)  When will the BeagleFilterOle be ready??

I should be able to have Open implemeneted today or tomorrow. 
DoPullProperties won't work yet, but everything needed to implement the
subclasses will work.

> Meanwhile, i will just concentrate on writing word/excel specific
> filter.

Perfect.

Thanks,
    Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]