Re: word processor document format: what parts?

From: Christopher Curtis <ccurtis ee fit edu>
To: gnome-list gnome org
Subject: Re: word processor document format: what parts?
Date: Sun, 20 Sep 1998 15:02:04 -0400 (EDT)

On Sun, 20 Sep 1998, J. Patrick Narkinsky wrote:

> On Sun, 20 Sep 1998, Mark Galassi wrote:
> > Patrick, I would love it if you could share your design ideas.
> 
> I'm convinced that the Right Way To Do This (TM) is by creating an edit
> component that is intrinsically aware of XML tags and XSL styles and
> exports a DOM API. It seems to me that data needs to be stored in the
> component not in a long string, which is what most free software WP's seem
> to do), but in some kind of tree structure, probably mirroring the DOM
> API.

I've given this only a little thought, and my idea is quite different from
yours.  I'll post it only to offer another viewpoint.

> easily determine which XSL rule applies to us.  Consider the following xml
> document:
> 	<bogusdoc>
> 	<bold>
> 		This is bolded text
> 	</bold>
> 	<italic>
> 		This is italic text
> 		<bold>
> 			This is bold/italic text
> 		</bold>
> 	</italic>
> 	</bogusdoc>

I think that this is not the Right Way(tm).  Consider instead:

<bogusdoc>
<style bold>
 <font face=arial points=12 weight=bold>
</style>
<style italic>
 <font face=arial points=12 weight=oblique>
</style>
<style bold-italic>
 <font face=arial points=12 weight=bold,oblique>
</style>

<bold>
This is bold text.
</bold>
<italic>
This is italic text.
</italic>
<bold-italic>
This is bold-italic text.
</bold-italic>
</bogusdoc>

This way we don't really have to concern ourselves with contexts, and the
entire document can be structured as such:

typedef struct style_t
{	char *style_string;	// or whatever: font metric, ...
	struct style_t *next;
} style;			// let the software deal w/inheritance

typedef struct passage_t
{	char *text;
	style *style;
	struct passage_t *next;
	struct passage_t *prev;
} passage;

With a data flow such as:

0x0001
+----------------------------+
| text = "This is bold text" |
| style = *(bold)            |
| next = 0x0002              |
| prev = NULL                |
+----------------------------+
0x0002
+------------------------------+
| text = "This is italic text" |
| style = *(italic)            |
| next = 0x0003                |
| prev = 0x0002                |
+------------------------------+

etc.  Applying a different style to a selection simply involves changing
the 'style' pointer.  Other things that might be useful in this data
struct including rendered width.  If you want to include such things as
kerning, you may have to break each selection up into individual lines (of
text on a sheet) as well.  It also means that if you change a style, you
only have to rerender the selection, not the entire document.

> The trick (and the part that got my head spinning) is going to be
> structuring the tree such that:
> 	a) context can quickly be determined (this is easier)
> 	b) Tree traversals can happen quickly and efficiently.
> B is what will make or break us.  Simply put: every time you insert
> something in a document, you're going to have to traverse whatever part of
> the tree lies on screen -- you might be able to optimize this down to the
> line in some contexts.  

I think this solves the problem of 'B', becuase you only have to rerender
the current selection.  Even with a large number of styles this should be
quick.  I think a bigger problem will simply be trying to find the current
context, but if you know the style of the current context, this could make
lookups easier and faster as well.  The biggest problem with this approach
is that text is very linear.  It may be good to have some sort of lookup
array, perhaps based on pages, to find where you are more quickly in large
documents.

> The point of the DOM API goes thusly: with a (possibly extended) DOM API,

This I don't know and won't comment on...

> So many more issues...  For example, there are all the mechanics involved
> in building tables and such.  

Tables, cross-references (this is a style, no?), TOC, TOF, figures, index,
footnotes, master pages, anchored graphics, flows, text wrapping around
graphics, non-contiguous text (flows), ... ;-)

Christopher

Follow-Ups:
- Re: word processor document format: what parts?
  - From: J. Patrick Narkinsky

References:
- Re: word processor document format: what parts?
  - From: J. Patrick Narkinsky

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]