Re: Duck season! Rabbit season! Mallard season!



On Wed, 2007-02-21 at 22:33 -0500, Eric S. Raymond wrote:
> (Please forgive the cheesy Warner Brothers reference in the subject.)
> 
> INTRODUCTION
> 
> This is a critique of the Mallard proposal found at an absurdly long
> URL for which one might as well google.  I'm uttering it because I
> have been thinking hard about a different, GNOME-independent
> architecture for handling documentation that would solve a
> slightly different set of problems.

I'm not sure which URL you're referring to.  To bring everybody
up to speed, there are only two pages about Mallard that I know
about: my original call to arms, and the one on live.gnome.org.

http://www.gnome.org/~shaunm/quack/mallard.xml
http://live.gnome.org/ProjectMallard

Eric, I'm assuming you've read both because you reference stuff
from each in your email.  I need to provide disclaimers about
both of these pages.

My call to arms was written nearly a year and a half ago.  As
such, it doesn't completely reflect what's currently going
trough my head.

The wiki page was written largely by people other than me who,
bless them all, are trying desperately to parse my madness and
keep the world at large informed.  Some of the issues in there,
such as a suitable documentation license, are not really part
of Mallard.  Rather, they are other issues the GDP is facing
which have sort of gotten lumped under one code name.

> My ideas (about which more later) aren't flatly incompatible with
> Mallard, but they do suggest that some of its ambitions could use a
> bit of trimming and refocusing.  What I'd like to do, ultimately,
> is keep the good ideas from Mallard and mate them to some I'm not
> going to specify just yet.

And before we begin, do note that I've been doing this gig
for quite some time now.  I've burned through at least three
shared help system (i.e. how do we install documentation?)
proposals with no headway, I've seen at least three potential
visual editors bang in and fizzle out, and I have done my own
DocBook XSLT implementation (for numerous reasons I can go
into if anybody really cares).

So, you know, I'm starting to get a real sense of what does
NOT work.  Let's hope I can translate that into what does
work.

> ISSUES WE CAN...ER...DUCK
> 
> I'm going to start by trying to factor out a couple of issues.  
> 
> One: choice of documentation license.  This is a political brouhaha we
> don't have to get into before writing and deploying better software.

As mentioned above, the issue of a license is rather orthogonal
to everything else.  It is, however, an issue that the GDP is
very interested in solving.  But that's in internal struggle.
I'm not about to dictate the license of documentation written
in the Mallard markup language.

> Two: a new documentation editor. While I applaud the goal, I think this
> simply is not worth solving as a one-off with the markup structure
> wired in.  What we need is a schema-driven XML editor; the schema might be
> DocBook, or it might be SMNL (Shaun's Mystical New Language), but baking
> either schema into the editor code would be, IMHO, nuts.  If for no other
> reason than that it would hugely raise the costs of modifying and
> extending the language once we actually understand the problem domain.

I can't tell you how many times I've called for a generic
schema-driven editor.  Conglomerate did some neat things
in this realm, but it didn't have quite the near-WYSIWYG
quality our writers would like.  Dodji (the maintainer of
libcroco and all around good guy) was doing some really
interesting work here, but it hasn't gotten where we need
it.

Writing a general-purpose schema-driven visual editor that
can display a document as something resembling a document
is a laudable goal.  I want one.  But it's not getting done.

If we can create a solution that just solves our immediate
needs instead, that's good enough for me.

> Three: Content issues -- the fresh look at the tone and style, etc.
> If the new tools seriously constrain what we do with "style", or
> vice-versa, something is badly wrong.  Better to treat these as
> orthogonal issues and not get mugged later by our own assumptions.

DocBook seriously constrains what you do with style.  And
it's a good thing.  Writers need to worry about writing
good content.  I may be misunderstanding you here, as you
seem to agree elsewhere that semantic markup is good.

> EASY STUFF
> 
> This leaves the following bullet items (quoted) from before we get 
> into Shaun's detailed notes:
> 
> * A markup language
> 
> * A standard place and way to keep docs
> 
> * The docs themselves: topic-based rather than monolithic documents. 
>   And cross-linked up the wazoo of course.
> 
> Let me start with the last first.  The single novel idea in this
> proposal I agree with most strongly is Shaun's vision of topic-based
> documentation, "cross-linked up the wazoo".  However (and this is an
> important caveat) this has value mainly as an organizing principle for
> *new* documentation.  There are other, major problem -- like making
> legacy documentation (100,000 man pages) accessible and searchable --
> that it doesn't touch.  (My own ideas are more focused on that.)

I think there are a lot of interesting things that could
go into a help system.  Categorization of documentation,
for instance, is a huge problem that's hard to solve.

But it's worth pointing out that Yelp, today, does make
your legacy documentation accessible and searchable.  All
your man and info pages are there.  What's interesting to
us isn't so much finding that stuff.  We need to figure
out where to put new stuff in a way that Gnome and KDE
and any other interested party can agree on.  This has
proven to be surprisingly difficult.

> A standard place and way to keep docs.  Agreed, this is a significant
> issue.  Also agreed that ScrollKeeper needs to be taken out behind the
> barn and shot.  I haven't looked at Spoon in detail yet, but I'm quite
> willing to believe that it could be the basis of something better.  In
> fact a working son-of-Scrollkeeper would also be a huge boost towards
> my somewhat different goals, so I'd love to see Spoon succeed.
> 
> A MARKUP LANGUAGE
> 
> A markup language.  This is where I start to differ.  I understand
> the attraction of designing such things -- I'm prone to it myself --
> but I also know that unless one is *very* careful such projects tend to
> wander deep into the weeds and get lost in over-elaboration.

The way I see it, there are three options: XHTML, DocBook, and
something new and different.  I'll discuss why I'm done with
DocBook later, when you bring up refentry.  For now, I'll deal
with XHTML.

> So my first real challenge to the Mallard plan is this:  Why couldn't
> SMNL be a well-defined subset of XHTML+CSS?  Each node would be a page.
> If you need more inlines than XHTML gives you, declare a CSS class with 
> a standardized interpretation.  You'd get (non-CALS) tables for free.
> 
> This approach would have huge advantages in terms of the amount of code
> (browsers, editors, validators, translators) that nobody would have to
> write.  Transclusion could be handled by fairly lightweight interpreters
> for PIs, something like the way DHTML worked.
> 
> I see no requirement in "A Markup Language" that would block this approach.

I have two qualms with using XHTML for this purpose.  First,
imposing a set of standard CSS classes would be hard.  Not
impossible by any means.  But hard.  If writers are writing
by hand, it's more work.  If they're using a graphical editor,
that editor needs to be trained to do our documents.

Could we train an editor to our flavor of XHTML?  I'm sure we
could, in theory.  I'm also sure we could write a really good
graphical editor for whatever markup we're using.  But since
we haven't yet, I'd like to make sure hand editing isn't too
painful.

Well-designed source formats are easy to write.  There's no
single best-designed source format.  It depends on the domain.

My second issue is much more serious.  Look at DocBook's xref
element.  The defined behavior of that element is to insert
text automatically into the document.  There are other cases
of this in DocBook, and Mallard introduced some new ones.

We would be severely abusing XHTML by adding any implicit
automatic text handling.  The only way it's not abuse is if
we create some new elements in another namespace and call
our documents XHTML+foo.  But then we might as well just do
the whole markup language and save our writers the headache
of mixed-markup documents.

> RETHINKING HELP
> 
> Again, I find myself in strong agreement with almost everything in 
> <http://www.gnome.org/~shaunm/quack/mallard.xml>.  Shaun's dissection 
> of the state of most on-line help ("Reading the interface back to 
> me is not helpful") is particularly trenchant.

Dave Malcolm once wrote a script using Dogtail (a testing
framework built off of our accessibility framework).  It
would look at a window and automatically create DocBook
that read the interface back.  The results were eerily
similar to much of our documentation. 

> And again, I begin to differ only on the language-design issues.  I think
> this proposal is rushing into playing with novel XML way too soon.  
> Taking the first one of his examples, I don't see what
> 
> <topic id="fe">
>   <info>
>     <!-- Metadata here -->
>   </info>
>   <title>Fe</title>
>   <!-- Block-level content here -->
>   <!-- Sectioning content here -->
> </topic>
> 
> would gain us over this bog-standard XHTML:
> 
> <html id="fe">
>   <head>
>     <!-- Metadata here -->
>   </head>
>   <body>
>     <title>Fe</title>
>     <!-- Block-level content here -->
>     <!-- Sectioning content here -->
>   </body>
> </html>

>From that little bit of markup, sure, you don't get much
from Mallard.  The very next sample on that page, though,
shows links embedded into the info element.  That page
doesn't do a good job of showing or explaining what we
do with those links, but I assure you it's more than
what you get with some link elements in an HTML head.

I think I haven't done a very good job of keeping people
informed of what I'm doing.  As a result, people aren't
really grokking what I'm trying to do, or why XHTML or
DocBook won't cut it.

> I also think that use of XML-DocBook is dismissed too quickly.
> While it's certainly the case that the whole of DocBook is too big for
> this application, I'd like to see a serious swing taken at
> *subsetting* it for this purpose before any new schema gets defined.
> 
> Further down, I bridle a bit at the criticism that XML-DocBook is 
> "too semantic".  That's not a bug, it's a feature that we're going to
> want back rather badly some day when our search algorithms start 
> relying on a richer ontology than HTML's.
> 
> I actually have a path forward to suggest rather than just criticizing.
> 
> REFENTRY IS YOUR FRIEND
> 
> In the bullet point "Structural markup doesn't fit our needs.", it seems
> to me that Shaun is missing what could be done with the RefEntry
> document type.
> 
> I'm intimately familiar with it, because I wrote a program called 
> doclifter that lifts manual page sources to RefEntry documents
> with over a 90% success rate (actually over 99% with trivial fixes
> for broken man markup).  
> 
> One possibility is that the Mallard markup should be a subset of DocBook
> that uses a subset of RefEntry as the main topic-node type.  Again this
> would be a huge benefit in terms of code nobody has to write.

My early, early ideas on this subject actually did involve
subsetting DocBook.  But then to get everything we want,
we'd need to subset and extend.  And we'd still be left
with things we don't like.

DocBook, even a subset thereof, is hard to write.  No
amount of subsetting is going to make mediaobject less
obtuse.  And if it looks like DocBook, people will think
it is DocBook.  So they'll try to use all their favorite
elements.  If we don't include them, we screw with the
heads of people who know DocBook.  If we do include them,
we make life just as difficult for our writers.  There's
just too much stuff there.

DocBook is also incredibly hard to process correctly.
One problem is that it's intentionally under-specified,
allowing implementations to make decisions.  When you
know your target processor (say, when you're writing
a book for a specific publisher), that's fine.  But
writing a document that can be deployed in different
desktop environments and such is another thing.

Take, for example, xref.  The exact text of the xref
element is not specified.  Combine this with supporting
over 60 languages with dozens of rules for different
grammatical roles, and you've got a recipe for trouble.

We could specify and solve each of these issues, but
then we'd be left with something that kind of looks
like DocBook, but isn't really DocBook anymore.  And
I truly believe that would do more harm than good.

--
Shaun





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]