Re: proposed modules

Bill Haneman <Bill Haneman Sun COM> writes:

> "Easier" than what?  Adding it to the old xpdf?  Or adding it to
> closed-source acroread?
> Getting the text out in the form of strings, which is already
> apparently feasible, is a first step, but more is needed, such as
> caret navigation, text layout info, and the ability to get the
> 'accessibility' info from a PDF file's markup (accessibility features
> were added to the PDF format around version 1.4).  There does seem to
> be a feature-documentation problem with PDF as was inferred earlier.

'Easier' than with the old code base.  poppler has code to handle text
flow and extents, unlike xpdf.  This will let us write caret support and
stream the text to ATs.  I don't think that the caret will be hard to
add to the code base.  And simply reading the visible area without the
caret is even easier.

As I said, this is now a pretty straightforward project -- it just needs
someone to tackle it.

The 'accessibility' features of PDF docs themselves are triviablly
extractable too, but they really need the rest of the framework before
doing so.  For example, the alt tag is pretty useless by itself.


