Re: Unicode typography in translations

2016-11-16 11:10 GMT-02:00 Ask Hjorth Larsen <asklarsen gmail com>:

A simple conformity check:

  gtgrep -cn --msgstr '\.\.\.'   filename.po

I had a bit of a battle to get the regex escapes right in bash, but
this should weed out most false positives:

  gtgrep -cn --msgstr '(?<!\.)\.\.\.(?!\.)'   filename.po

However if the translation uses "...." or ".." when it should really
be using an ellipsis, the first form is better anyway, or maybe
(?<!\.)\.{2,4}(?!\.).  Probably it's best just to use the simple one

(gtgrep comes from pyg3t)

Thanks for remembering about gtgrep. I believe gtgrep fits my needs as it outputs msgstr and msgid that matches the pattern, making it easier to search for these characters, mainly -cn flags together.

It is also possible to count the matches of a given pattern comparing between msgids and msgstrs, by combining -C with --msgid or --msgstr flags. If the comparision shows different number, something may be wrong in the translation -- depending each language.


Something similar could be done for the other characters, but of
course quotation marks vary a lot depending on language, so it would
not all be completely portable to all languages.

Indeed. However, one doesn't necessarily need to use --msgstr flag to search pattern; instead, '--msgid' flag can be used to search for horizontal ellipsis (…), and then look the output in order to find three dots (which is a piece-of-cake in a monospace font).

Best regards,
Rafael Fontenelle

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]