Re: Publishing HTML

From: Colm Smyth <Colm Smyth sun com>
To: Colm Smyth sun com, d-mueth uchicago edu
Cc: gnome-doc-list gnome org
Subject: Re: Publishing HTML
Date: Tue, 26 Jun 2001 10:37:24 +0100 (BST)
Hi Dan,

Thanks for your thoughtful and informative reply. I agree that there
are some advantages to distributing XML rather than render-ready HTML.

Two things occur to me upon reading your mail:

- (paranoia on) a multi-user writeable cache for html files creates more issues
  than one for man-pages; HTML isn't as harmless a document format as you might
  like because it is a host for executable content (javascript, plug-ins, java,
  ...; also remote execution (form input methods like cgi, servlets, ...) 
  it would be possible to edit a html page to bind different actions to buttons
  or hyperlinks

- a help server could solve the problem of security with a multi-user
  cache; making it a http server (that serves html) could allow a
  regular browser to view help; it also would allow regular kinds of
  http cache (browser or http proxy); lastly it allows regular kinds of
  bookmarking for favourite help items

Just some thoughts!

All the best,
Colm.

>X-Authentication-Warning: enlightenment.uchicago.edu: dmueth owned process 
doing -bs
>From: Dan Mueth <d-mueth uchicago edu>
>X-Sender: <dmueth enlightenment uchicago edu>
>To: Colm Smyth <Colm Smyth sun com>
>cc: <gnome-doc-list gnome org>
>Subject: Re: Publishing HTML
>MIME-Version: 1.0
>
>
>On Fri, 22 Jun 2001, Colm Smyth wrote:
>
>> I was chatting with Laszlo Kovacs about scrollkeeper and he mentioned
>> that some people had a few good reasons for publishing the GNOME
>> documentation as SGML, to be converted on the fly to HTML by Nautilus.
>
>Others have provided some good replies to this, including the flexibility
>of the display stylesheets which Telsa mentioned and hadn't occured to me.
>I thought I would comment on another advantage to the way we are doing
>things...
>
>Generally, we have SGML (soon to be XML) documents from which we extract
>various sorts of metadata (index, table of contents, searching database,
>etc.) and we convert the SGML/XML document into a display format (HTML):
>
>              metadata
>           /
>  XML doc
>           \
>             display formatted doc
>
>We currently do both the metadata extraction and document conversion after
>the package is installed: the metadata extraction happens at post-install
>time and the document conversion occurs at run time.
>
>An alternate solution would be to do all this at build time and then just
>installing the metadata and display formatted document.  This is the
>solution used by JavaHelp and has the benefit of not having install-time
>scripts or affecting performance.
>
>We decided not to do this because we are an open source project.  The
>significance of this is that we (1) like our software to be as flexible as
>possible, (2) have frequent and asynchronous software releases, and (3)
>prefer writing code and developing things as we go over developing a very
>detailed specification before we start coding.
>
>Flexibility: By allowing the conversion from XML to HTML to happen at run
>time on the client, the client machine has the flexibility to control what
>software is used in the conversion, what stylesheet is used, etc.  We will
>be exploiting this not only to allow for customized
>stylesheets/appearance, but also to allow our XML->HTML conversion to
>insert special anchors which we can link against.  I suppose we may
>consider improved compatibility with non-GNOME documentation (KDE, LDP,
>etc.) in here as well.  Generally, the user/sys admin/distribution can
>customize the help system without customizing every package in the
>distribution or which the user might potentially use. I'm sure with a
>little thought, we can come up with a little laundry list of ways this
>flexibility can be exploited.
>
>Asynchronous releases: Free software has the motto "release early, release
>often".  We need to make the dependency between application packages
>(which include the docs) and the help system as weak as possible.  This is
>difficult to do if the application packages ship with pre-processed
>documents since each application package will only work with certain
>versions of the help system.  In this way the help system would be like a
>library which is used by all the applications on your computer which have
>docs (ie. a lot of packages!).  Thing are okay if the help system is
>mature, stable, well-specified, and shared by all the applications on the
>system.  If the help system is not well specified and mature, there will
>be compatibility problems. Also, if some documents are shipped without
>this particular help system in mind, then the documents will not show up
>in the help browser.  By having the metadata handled only on a
>post-install basis, it is easy to have subsequent releases with
>significant internal changes or even to completely replace the help system
>provided we remain compatible with a small set of exported API.  (eg.
>ScrollKeeper can generate a new database from scratch at any time if the
>old database becomes corrupt or accidentally deleted or if a new version
>of ScrollKeeper has an incompatible database format.)
>
>Specification: To do everything at build time requires that a carefully
>thought out and detailed specification be provided before the system is
>used, since we need to remain backwards-compatible with application
>packages which include all the document metadata which the help system
>will have access to.  An evolving or poorly written specification would
>cause serious compatibility problems or functionality limitations ("doh,
>if only we thought of that last month!").  Writing a good spec is an
>option, but it is dangerous in that it may not be as good as we think it
>is and it pushes out release dates into the distant future while we try to
>dream up and write down the perfect specification without the benefit of
>real use to guide the specification.
>
>So, hopefully I've shown some of the reasons why doing things at
>post-install time works better for an open source project which wants to
>work on a release early, release often basis.  The cost of this is
>primarily performance.  By doing things at install time (for ScrollKeeper)
>and by caching the results of things done at run-time (gnome-db2html3,
>although not implemented in Nautilus yet), we greatly reduce any
>performance problems that users may observe while running the software.
>The other disadvantage is that we have to deal with the security issues
>associated with caching files (for gnome-db2html3) which are generated by
>multiple users.  I think that problem has been dealt with by 'man' and
>that we can just borrow their solution for a fairly secure solution[*].
>
>Dan
>
>[*] I'm not sure how secure this approach would be if we allow the system
>to handle any XML "doc" the user throws at it...  You probably shouldn't
>be running Nautilus as root anyway ;)  Perhaps others could comment on
>whether they think this is a decent way to handle cache files, noting that
>we will be feeding the help browser untrusted docs which might be owned by
>a user or even on the web.
>
Follow-Ups:
- Re: Publishing HTML
  - From: Trevor Curtis
- Re: Publishing HTML
  - From: Dan Mueth
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]