Using the low-level API from Python / Please advise for WeasyPrint

From: Simon Sapin <simon sapin exyr org>
To: gtk-i18n-list gnome org
Subject: Using the low-level API from Python / Please advise for WeasyPrint
Date: Fri, 05 Oct 2012 16:18:22 +0200

Hi,

I’m a developer of WeasyPrint, an open-source layout and paginationengine for HTML/CSS written in Python: http://weasyprint.org

We render to cairo, with Pango and PangoCairo for text. (cairo can thenexport to various file formats, in particular PDF.)

*Warning*: long message ahead. Short version: how can we use thelow-level Pango API from Python? Or should we even skip Pango and onlyuse Harfbuzz, as suggested in http://behdad.org/text/ ?



In full details:

We use Pango not only to actually render the text, but also in theearlier layout step to find line breaks and get various sizinginformation about the text (advance width, leading, baseline, …)

This is currently done with high-level Pango API (Layout, LayoutLine,FontDescription, …) because it all that is available from Python, eitherthrough PyGTK or through PyGObject3-introspection.

However we can not give whole paragraphs at a time to Pango since weneed a lot of inline-level control: one line could contain multipleelements with images, vertical alignment, relative positioning, …

Therefore we have a horrible hack where uninterrupted chunks of text(stopping at the next HTML tag) are passed to a PangoLayout with theavailable width until the end of the current line. Then, only the firstLayoutLine is kept and the rest thrown away. The process is repeated forevery line of the remaining text.

If you’re interested, the relevant code is in text.py andlayout/inlines.py :

https://github.com/Kozea/WeasyPrint/tree/master/weasyprint

In addition to being obviously inefficient, this design has somelimitations:


* The 'font-family' CSS property is just passed to FontDescription, so
  there is no conforming font matching algorithm[1] or @font-face[2]
* No way to add hyphenation (or did I miss something?)
* No control on line breaks. For examples when breaking at a space
  character, PangoLayout leaves the space at the end of the line and
  requires width for it, but do not report this width in the
  LayoutLine. If the available width is just enough for two
  words but not for two word plus a space, the break will be after the
  first word. This causes the "shrink-to-fit" algorithm to give
  incorrect results, as well as some CSS tests to fail.

[1] http://www.w3.org/TR/CSS21/fonts.html#algorithm
[2] Downloading fonts from the web:
http://www.w3.org/TR/css3-fonts/#font-resources

As the team working on WeasyPrint is very small, we’ve cut some cornersand made design choices for ease of development rather than (forexample) run-time speed. Using Pango like this has worked well enough(many thanks to all of you who worked on it!) but we’ll want to changethat at some point in the future.

I think that the way forward is to switch to the low-level API. Couldsome introspection data be added to make it available from Python? Orshould we write C and skip PyGObject? What about HarfBuzz, how is itrelevant for this use case?



Thanks in advance for your advice.

Regards,
--
Simon Sapin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]