Re: Unicode typography in translations

2016-11-16 13:43 GMT+01:00 Rafael Fontenelle <rafaelff gnome org>:

2016-11-15 10:21 GMT-02:00 Ask Hjorth Larsen <asklarsen gmail com>:

Hi Rafael

2016-11-15 2:26 GMT+01:00 Rafael Fontenelle <rafaelff gnome org>:

It would be nice to have a script with regexp that could compare msgstr
msgid in a PO file, and report strings that are not in compliance with
GNOME's HIG typography.  I don't have such scripting skill, but if
has it, please consider do it.

Rafael Fontenelle

It is easy to recognize when the English string contains something,
and the translated string does not (e.g. to find a unicode ellipsis
that was translated to an ASCII ellipsis).  But if the English string
uses ASCII, it is not always easy.  For example recognizing exactly
when the en-dash could or should be used instead of an ASCII hyphen.

It is probably a good assumption that any sequence of exactly three
dots should be a unicode ellipsis, no matter the context, but that's
the only trivial case.

Best regards

I agree that is not hard to recognize them while translating a PO file. I
just would like to have a solution that allows a conformity checking that
could be run anytime, as such Unicoded characters could be missed by the
translator (myself included).

Rafael Fontenelle

A simple conformity check:

  gtgrep -cn --msgstr '\.\.\.'   filename.po

I had a bit of a battle to get the regex escapes right in bash, but
this should weed out most false positives:

  gtgrep -cn --msgstr '(?<!\.)\.\.\.(?!\.)'   filename.po

However if the translation uses "...." or ".." when it should really
be using an ellipsis, the first form is better anyway, or maybe
(?<!\.)\.{2,4}(?!\.).  Probably it's best just to use the simple one

(gtgrep comes from pyg3t)

Something similar could be done for the other characters, but of
course quotation marks vary a lot depending on language, so it would
not all be completely portable to all languages.

Best regards

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]