Re: Publishing HTML



On Fri, 22 Jun 2001, Colm Smyth wrote:

> I was chatting with Laszlo Kovacs about scrollkeeper and he mentioned
> that some people had a few good reasons for publishing the GNOME
> documentation as SGML, to be converted on the fly to HTML by Nautilus.

Others have provided some good replies to this, including the flexibility
of the display stylesheets which Telsa mentioned and hadn't occured to me.
I thought I would comment on another advantage to the way we are doing
things...

Generally, we have SGML (soon to be XML) documents from which we extract
various sorts of metadata (index, table of contents, searching database,
etc.) and we convert the SGML/XML document into a display format (HTML):

              metadata
           /
  XML doc
           \
             display formatted doc

We currently do both the metadata extraction and document conversion after
the package is installed: the metadata extraction happens at post-install
time and the document conversion occurs at run time.

An alternate solution would be to do all this at build time and then just
installing the metadata and display formatted document.  This is the
solution used by JavaHelp and has the benefit of not having install-time
scripts or affecting performance.

We decided not to do this because we are an open source project.  The
significance of this is that we (1) like our software to be as flexible as
possible, (2) have frequent and asynchronous software releases, and (3)
prefer writing code and developing things as we go over developing a very
detailed specification before we start coding.

Flexibility: By allowing the conversion from XML to HTML to happen at run
time on the client, the client machine has the flexibility to control what
software is used in the conversion, what stylesheet is used, etc.  We will
be exploiting this not only to allow for customized
stylesheets/appearance, but also to allow our XML->HTML conversion to
insert special anchors which we can link against.  I suppose we may
consider improved compatibility with non-GNOME documentation (KDE, LDP,
etc.) in here as well.  Generally, the user/sys admin/distribution can
customize the help system without customizing every package in the
distribution or which the user might potentially use. I'm sure with a
little thought, we can come up with a little laundry list of ways this
flexibility can be exploited.

Asynchronous releases: Free software has the motto "release early, release
often".  We need to make the dependency between application packages
(which include the docs) and the help system as weak as possible.  This is
difficult to do if the application packages ship with pre-processed
documents since each application package will only work with certain
versions of the help system.  In this way the help system would be like a
library which is used by all the applications on your computer which have
docs (ie. a lot of packages!).  Thing are okay if the help system is
mature, stable, well-specified, and shared by all the applications on the
system.  If the help system is not well specified and mature, there will
be compatibility problems. Also, if some documents are shipped without
this particular help system in mind, then the documents will not show up
in the help browser.  By having the metadata handled only on a
post-install basis, it is easy to have subsequent releases with
significant internal changes or even to completely replace the help system
provided we remain compatible with a small set of exported API.  (eg.
ScrollKeeper can generate a new database from scratch at any time if the
old database becomes corrupt or accidentally deleted or if a new version
of ScrollKeeper has an incompatible database format.)

Specification: To do everything at build time requires that a carefully
thought out and detailed specification be provided before the system is
used, since we need to remain backwards-compatible with application
packages which include all the document metadata which the help system
will have access to.  An evolving or poorly written specification would
cause serious compatibility problems or functionality limitations ("doh,
if only we thought of that last month!").  Writing a good spec is an
option, but it is dangerous in that it may not be as good as we think it
is and it pushes out release dates into the distant future while we try to
dream up and write down the perfect specification without the benefit of
real use to guide the specification.

So, hopefully I've shown some of the reasons why doing things at
post-install time works better for an open source project which wants to
work on a release early, release often basis.  The cost of this is
primarily performance.  By doing things at install time (for ScrollKeeper)
and by caching the results of things done at run-time (gnome-db2html3,
although not implemented in Nautilus yet), we greatly reduce any
performance problems that users may observe while running the software.
The other disadvantage is that we have to deal with the security issues
associated with caching files (for gnome-db2html3) which are generated by
multiple users.  I think that problem has been dealt with by 'man' and
that we can just borrow their solution for a fairly secure solution[*].

Dan

[*] I'm not sure how secure this approach would be if we allow the system
to handle any XML "doc" the user throws at it...  You probably shouldn't
be running Nautilus as root anyway ;)  Perhaps others could comment on
whether they think this is a decent way to handle cache files, noting that
we will be feeding the help browser untrusted docs which might be owned by
a user or even on the web.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]