Re: Status of the yelp man-page parser



On Thu, 2007-02-15 at 18:34 -0700, Brent Smith wrote:
> Eric S. Raymond wrote:
> > I'm doing some work on the groff manual pages, trying to make sure
> > they render correctly through various programs such as man2html,
> > the KDE help broweser, manServer, and yelp.
> > 
> > Towards that end, I recently had a look at the yelp source code at
> > <http://cvs.gnome.org/viewcvs/yelp/src/yelp-man-parser.c?view=markup>.
> > My goal was to enumerate the sets of troff requests and escapes that
> > it interprets.
> > 
> > I left that page more puzzled than I arrived.  It doesn't look to me 
> > like that code is functional -- in particular, it looks like .if
> > conditionals and escapes in macro arguments aren't handled.  And yet,
> > yelp seems to be doing a pretty good job of displaying man pages that
> > I know require these features.
> > 
> > Have I missed something here, or is yelp actually doing something
> > like calling groff behind the scenes? 
> 
> How's this for a late response!
> 
> I was probably the last one to touch that code, and I must say that it 
> is all rather hackish and basically ignores basically all conditional 
> commands.  Surprisingly it formats the majority of man pages fairly 
> well, but really it should be re-written the "right" way (I'm not quite 
> sure what the right way is).  Ages ago there was a dependency on some 
> man2html program, but for reasons that are beyond me, it was re-written 
> to remove that dependency (I think I remember hearing maintainability as 
> being the major problem).

There were basically three reasons for doing this:

1) Maintainability.  I spent an entire weekend once
   trying to make some trivial formatting changes.

2) Output prettification.  One of the things I've
   tried to do is make our man and info pages have
   a similar visual style to everything else.  This
   was really hard with man2html.

3) No execs.  When I took over Yelp, it called out
   to external programs for everything (including
   DocBook), read the output from those programs,
   and reparsed for special comment markers for
   chunking.  I may have overcompensated.

> The parser converts the groff formats to an intermediate xml based 
> format, and then uses yelp's xslt "pager" system to convert the 
> intermediate xml format to HTML for display.

So the idea was that we would parse to, effectively,
an expression tree, and then format that.  And since
our expression tree was XML, hey use XSLT to format,
since it's good at that.

The man2html we used before went directly from man
to HTML without anything in the middle.  This meant
that formatting changes could involve substantial
coding efforts.

The best way to do a conversion, in my opinion,
would be if we could find a way to get groff to
give us a canonicalized expression tree.  Let
groff do the parsing and conditional resolving,
and then our work is easier.  But I don't know
how to do that.  I did look.

Now, alternatively, a program that reliably turns
man into DocBook would save us the parse headache,
but still allow us to control the formatting.  In
fact, it would be that much easier to make man look
like everything else.

We have to be very careful about speed though.

--
Shaun





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]