Re: Low memory hacks



Danilo Šegan wrote:
Hi Simos,

Yesterday at 15:02, Simos Xenitellis wrote:

I'll like to see some real numbers on the memory usage instead of
numbers being thrown around.
In Ubuntu 7.10, the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
2.3M    /usr/share/locale/en_GB/LC_MESSAGES
17M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/
$_
In Ubuntu 8.04 (alpha 6), the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
84K    /usr/share/locale/en_GB/LC_MESSAGES
2.2M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/
$_
What I am missing here is that I do not know when/how Ubuntu adds this
functionality. It would benefit other distros as well. Did Debian
introduce with feature? Danilo, any links?

I am not handling Ubuntu packaging stuff—it'd be worth checking with
Ubuntu guys instead.  Martin Pitt is probably the right person to ask
about it, but looking at the language pack sourcepackage should give a
clue as well.

However, I'd note that en_GB is not really the right locale to do
the metrics on.
Hi Danilo,
Why would en_GB not be the right locale to do metrics on?
>From the 2.3M + 17M MO files in Ubuntu 7.10, a typical GNOME session
loads up a subset of the MO files,

# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq

At this moment, my 7.10 is a bit messed up (I have en_GB.UTF-8 but most
apps have en_US?!?). The figures for 8.04 with el_GR should be
comparative of what you get now with 7.10 and en_GB:

They wouldn't be. A majority of el_GR probably uses two-byte UTF-8
sequences, while en_GB would use a majority of single byte UTF-8
sequences (i.e. ASCII).
Good point. I provided in the other email figures from the same locale.
Halving the figures from "el" should give a very rough estimate.
# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq | awk
'{printf "%d+",$1}' > /tmp/bc_sums

Using "bc" with /tmp/bc_sums gives the figure
3.6M (3624412) for a standard session. This figure is a bit
conservative, because en_GB probably did more work than el.

With Ubuntu 8.04 (alpha6) and en_GB, the figure for the MO files is
less than 600K (585375).
Bastien, could you provide the proper figure for your system?

That is a saving of at least 3M in memory.

As Bastien explained, mmap() doesn't read the entire file into memory,
but only reads it as needed.

The stripping of "unneeded" messages is good, and should happen at the
package generation level (not in GNOME, or when creating tarballs).

Technically, I've opposed introducing this in intltool because of a
one incompatible difference:

  current gettext("Something") != such gettext("Something")

i.e. if "Something" was (un)translated as "Something" in the MO file,
gettext would return a static pointer with the string "Something".  If
it was untranslated, it would return the passed pointer.

That can and was used to detect whether there is a translation in some
programs (I've seen it done), so, until gains are proven to be big
enough to warrant breaking a few programs in strange ways, I wouldn't
do it on the packaging/build time.
I do not know whether GNOME applications do (or have the need to do) such a check. Can you give one example, in order to see why they need to find if there is a translation file?

A valid concern I have seen (and this has to do with correctness) is when people manually configure the LANGUAGE variable, with something like "es:fr:en". That is, pick the Spanish translation, if not available for a message pick French, else pick English. If the Spanish translation for a specific message is the same with English, but not in French, then the user will see the French translation (she should have seen the Spanish-English translation instead).

Danilo also gave an example with Serbian, if a user chooses something like "serbian_cyrillic:serbian_latin:en".

As far as I know, there is no UI tool (at least in GNOME) to set a triple LANGUAGE option.

For a general purpose system one may make the assumption that a single language is expected.
Of course, providing numbers to show what the gains are would help
make the decision.
Assuming my memory figures are correct (previous e-mail), I have provided file size and memory figures.

Simos



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]