Using the low-level API from Python / Please advise for WeasyPrint
- From: Simon Sapin <simon sapin exyr org>
- To: gtk-i18n-list gnome org
- Subject: Using the low-level API from Python / Please advise for WeasyPrint
- Date: Fri, 05 Oct 2012 16:18:22 +0200
Hi,
I’m a developer of WeasyPrint, an open-source layout and pagination
engine for HTML/CSS written in Python: http://weasyprint.org
We render to cairo, with Pango and PangoCairo for text. (cairo can then
export to various file formats, in particular PDF.)
*Warning*: long message ahead. Short version: how can we use the
low-level Pango API from Python? Or should we even skip Pango and only
use Harfbuzz, as suggested in http://behdad.org/text/ ?
In full details:
We use Pango not only to actually render the text, but also in the
earlier layout step to find line breaks and get various sizing
information about the text (advance width, leading, baseline, …)
This is currently done with high-level Pango API (Layout, LayoutLine,
FontDescription, …) because it all that is available from Python, either
through PyGTK or through PyGObject3-introspection.
However we can not give whole paragraphs at a time to Pango since we
need a lot of inline-level control: one line could contain multiple
elements with images, vertical alignment, relative positioning, …
Therefore we have a horrible hack where uninterrupted chunks of text
(stopping at the next HTML tag) are passed to a PangoLayout with the
available width until the end of the current line. Then, only the first
LayoutLine is kept and the rest thrown away. The process is repeated for
every line of the remaining text.
If you’re interested, the relevant code is in text.py and
layout/inlines.py :
https://github.com/Kozea/WeasyPrint/tree/master/weasyprint
In addition to being obviously inefficient, this design has some
limitations:
* The 'font-family' CSS property is just passed to FontDescription, so
there is no conforming font matching algorithm[1] or @font-face[2]
* No way to add hyphenation (or did I miss something?)
* No control on line breaks. For examples when breaking at a space
character, PangoLayout leaves the space at the end of the line and
requires width for it, but do not report this width in the
LayoutLine. If the available width is just enough for two
words but not for two word plus a space, the break will be after the
first word. This causes the "shrink-to-fit" algorithm to give
incorrect results, as well as some CSS tests to fail.
[1] http://www.w3.org/TR/CSS21/fonts.html#algorithm
[2] Downloading fonts from the web:
http://www.w3.org/TR/css3-fonts/#font-resources
As the team working on WeasyPrint is very small, we’ve cut some corners
and made design choices for ease of development rather than (for
example) run-time speed. Using Pango like this has worked well enough
(many thanks to all of you who worked on it!) but we’ll want to change
that at some point in the future.
I think that the way forward is to switch to the low-level API. Could
some introspection data be added to make it available from Python? Or
should we write C and skip PyGObject? What about HarfBuzz, how is it
relevant for this use case?
Thanks in advance for your advice.
Regards,
--
Simon Sapin
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]