New compose and format API



Hi everyone,

I went ahead and committed a preliminary implementation of the message
compose API that was proposed a while ago.  This new feature makes it
much easier to properly localize gtkmm applications.  The feature status
is tracked on bugzilla, too:

  http://bugzilla.gnome.org/show_bug.cgi?id=399216

What this is all about
======================

The classic approach to formatting a user-visible message string,
without resorting to snprintf() and the like, goes something like this:

  std::ostringstream output;
  output.imbue(std::locale(""));
  output << percentage << _("% done");
  label->set_text(Glib::locale_to_utf8(output.str()));

This is a ridiculous amount of code to generate a mundane "50% done"
label, but still misses out on a couple of corner cases.  Even more
important is that the STL stream interface makes proper localization
impossible.  The message should be passed to gettext() in one piece in
order to provide the required context, and to allow the translator to
rearrange the sentence structure as required by the target language.
One way to do that is to use a template string with placeholders for
separate arguments, like printf() in the C library:

  char* message = g_strdup_printf(_("%d%% done"), percentage);
  gtk_label_set_text(GTK_LABEL(label), message);
  g_free(message);

This approach lacks the type safety and exception safety of the STL
stream version.  Nonetheless, it's a shame that using C++ and glibmm
requires more code than plain C and GLib.

This is where the new message compose and format API comes into play.
Using the proposed API, the example from above looks like this:

  using Glib::ustring;

  label->set_text(ustring::compose(_("%1%% done"),
                                   ustring::format(percentage)));

A couple of more interesting use cases:

  ustring s;
  const double a = 3456.78;
  const double b = 7890.12;

  s = ustring::compose("%1 is lower than %2.",
                       ustring::format(a),
                       ustring::format(b));

  s = ustring::compose("%2 is greater than %1.",
                       ustring::format(a),
                       ustring::format(b));

  s = ustring::compose("%1 € are %3 %% of %2 €.",
                       ustring::format(a),
                       ustring::format(b),
                       ustring::format(std::fixed,
                                       std::setprecision(1),
                                       a / b * 100.0));

In a German locale the three composed strings are:

  3.456,78 is lower than 7.890,12.
  7.890,12 is greater than 3.456,78.
  3.456,78 € are 43,8 % of 7.890,12 €.

The complete example program demonstrating these use cases can be found
in the examples/compose directory in the glibmm SVN repository.

Some explanation of the details is in order.

      * Unlike with printf(), the compose and format functionality are
        provided separately.  This is the main difference to Ole
        Laursen's compose mini-library.  I'll get to that later.

      * Placeholders in the template string are in qt-format: A percent
        symbol followed by a single digit denoting the index of the
        argument to substitute, i.e. "%1", "%2", ..., "%9".  Two percent
        symbols "%%" result in a single "%" in the output.  Thus the
        maximum number of arguments is nine, which is probably not a
        real limit in practice.  Placeholders can occur in any order in
        the template string, which allows the translator to reorder
        substitutions freely.  Note that qt-format is recognized and
        fully supported by gettext.

      * The arguments to ustring::compose() are all of type
        Glib::ustring.  To format a number into a string
        ustring::format() must be used.  The arguments to format() are
        written sequentially to an output string stream and can thus be
        of any streamable type, including I/O manipulators.

      * Wide-character streams are used internally to enable
        fully-fledged internationalization support.  Using wchar_t
        streams avoids restricting the formatting results to either
        ASCII or at best the narrow locale codeset.  For instance, the
        thousands separator can be a code point outside the ASCII range
        in some languages.  The use of wchar_t streams also allows
        skipping iconv() on modern Linux and Windows system.

Alternative API
===============

With Ole Laursen's compose API, the placeholder substitution and string
formatting functionality are available through a single function.  This
design has the advantage of brevity:

  ustring s = ustring::compose("%1 € are %3 %% of %2 €.",
                               a, b,
                               std::fixed, std::setprecision(1),
                               a / b * 100.0);

This cuts down on the nesting of parentheses, which is definitely a good
thing.  However, I have some misgivings with this solution.  The main
problem is that there is no longer a clear correspondence of placeholder
index to argument position, since I/O manipulators have to be skipped.
Unfortunately there's no portable way to detect whether an object is an
I/O manipulator.  Thus a heuristic is used -- if passing the argument
through the I/O stream yields an empty string, it is assumed to be a
manipulator instead of a real argument.

Obviously the heuristic breaks down if an argument is really meant to be
an empty string.  While this would rarely be an issue for user-visible
messages, it feels like an arbitrary and likely unexpected restriction.
And possible use cases do exist, like the following example which is
only possible with separate compose and format steps:

  const double a = 3456.78;
  const double b = 7890.12;
  const int    i = int(a / (a + b) * 40.0);

  std::cout << ustring::compose("a : b = [%1|%2]",
                                ustring::format(std::setfill(L'a'),
                                                std::setw(i),
                                                ""),
                                ustring::format(std::setfill(L'b'),
                                                std::setw(40 - i),
                                                ""));

The output is a fancy ASCII art diagram:

  a : b = [aaaaaaaaaaaa|bbbbbbbbbbbbbbbbbbbbbbbbbbbb]

This is of course a somewhat silly example, but it shows that empty
arguments aren't entirely unreasonable.  Furthermore, string arguments
might end up empty as a result of unanticipated run-time behavior.

  ustring s = "abc";
  s = ustring::compose("Length of \"%1\" is %2", s, s.length());

==> Length of "abc" is 3

  ustring s = "";
  s = ustring::compose("Length of \"%1\" is %2", s, s.length());

==> Length of "0" is 

Oops.

This particular problem could be avoided by specializing for string
types but it quickly becomes awkward.  The empty string might be a
perfectly valid result of streaming an object of user-defined type.

So it comes down to a trade-off.  The combined API has brevity and ease
of use going for it.  On the other hand the separated API is more robust
and can be implemented more cleanly.

What do you think?

--Daniel





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]