Re: word processor document format: what parts?



On Sun, 20 Sep 1998, Mark Galassi wrote:

> Patrick, I would love it if you could share your design ideas.
> 

Okay, but I haven't gotten very far on it.  (My head's still spinning).

The thing that is getting very clear to me is that all the action, all the
problems, all the difficulties of this program lie in one area:  the edit
component.  Once the edit component is done, the rest of the programming
is (relatively) trivial: I don't think anyone is worried about how we're
going to write a spell checker. 

I'm convinced that the Right Way To Do This (TM) is by creating an edit
component that is intrinsically aware of XML tags and XSL styles and
exports a DOM API. It seems to me that data needs to be stored in the
component not in a long string, which is what most free software WP's seem
to do), but in some kind of tree structure, probably mirroring the DOM
API.

The tree structure serves us in a couple of ways.  First, we can easily
and consistently insert tags within tags ad infinitum.  Secondly, when
inserting additional data or a new sub-element into an element, we can
easily determine which XSL rule applies to us.  Consider the following xml
document:
	<bogusdoc>
	<bold>
		This is bolded text
	</bold>
	<italic>
		This is italic text
		<bold>
			This is bold/italic text
		</bold>
	</italic>
	</bogusdoc>
Obviously, the above is a trivial example.  However, the point remains:
depending on the context within which the bold tag is given, we need to do
two different things (i.e. bold vs. bold/italic).  With a tree structured
memory layout, we just go up the tree to get our full xml context.  With a
long buffer based approach, this is much more difficult.

The trick (and the part that got my head spinning) is going to be
structuring the tree such that:
	a) context can quickly be determined (this is easier)
	b) Tree traversals can happen quickly and efficiently.
B is what will make or break us.  Simply put: every time you insert
something in a document, you're going to have to traverse whatever part of
the tree lies on screen -- you might be able to optimize this down to the
line in some contexts.  

To make matters worse, we will also need to have some kind of tree
containing the XSL layout.  I'm thinking this could be cached by using an
in-memory attribute of the XML elements (in-memory = would not be written
to disk), but we're still looking at a significant amount of work to
determine what to do each time an element is inserted.

The point of the DOM API goes thusly: with a (possibly extended) DOM API,
it should be easy to control this edit component externally from a script,
or another program, or whatever.  I'm thinking that this component could
also be of great use, for example, to balsa.  With the API properly
exported, this would be trivial.  The advantage of using DOM is that a lot
of people already know it (it's used in web page stuff a fair amount
already).

It should be noted that this could really meet the concerns of the 'Word'
vs. 'FrameMaker' folks: once the edit component is designed, it could
easily be used to create either and or both.  (I maintain that the Word
program could just be a mode of the Frame program).

Another design thought: it should be possible to carry the idea of GUI
components with the DTD we're using.  So, for example, it makes no sense
to have toolbars pointing towards tables when the DTD doesn't support
tables.  Possibly some kind of 'gui hints' structure that the DTD could
carry around...

So many more issues...  For example, there are all the mechanics involved
in building tables and such.  

Okay -- with any luck, I now have everyone elses heads spinning.  Like I
said, this is all very much at the idea stage.  I certainly don't have
anything resembling a proper design.

Patrick


----------------------------------------------------------------------
If we're to have any luck stanching the vain drain, we just have to 
let nerds be nerds...  Owen Edwards, Forbes Magazine
----------------------------------------------------------------------



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]