Re: Low memory hacks

From: Simos Xenitellis <simos lists googlemail com>
To: Simos Xenitellis <simos lists googlemail com>, Bastien Nocera <hadess hadess net>, "Nikolay V. Shmyrev" <nshmyrev yandex ru>, Brian Nitz <Brian Nitz sun com>, desktop-devel-list gnome org
Subject: Re: Low memory hacks
Date: Tue, 18 Mar 2008 16:06:30 +0000

Danilo Šegan wrote:

Hi Simos,

Yesterday at 15:02, Simos Xenitellis wrote:

I'll like to see some real numbers on the memory usage instead of
numbers being thrown around.

In Ubuntu 7.10, the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
2.3M    /usr/share/locale/en_GB/LC_MESSAGES
17M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/

$_

In Ubuntu 8.04 (alpha 6), the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
84K    /usr/share/locale/en_GB/LC_MESSAGES
2.2M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/

$_

What I am missing here is that I do not know when/how Ubuntu adds this
functionality. It would benefit other distros as well. Did Debian
introduce with feature? Danilo, any links?


I am not handling Ubuntu packaging stuff—it'd be worth checking with
Ubuntu guys instead.  Martin Pitt is probably the right person to ask
about it, but looking at the language pack sourcepackage should give a
clue as well.

However, I'd note that en_GB is not really the right locale to do
the metrics on.

Hi Danilo,
Why would en_GB not be the right locale to do metrics on?

>From the 2.3M + 17M MO files in Ubuntu 7.10, a typical GNOME session

loads up a subset of the MO files,

# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq

At this moment, my 7.10 is a bit messed up (I have en_GB.UTF-8 but most
apps have en_US?!?). The figures for 8.04 with el_GR should be
comparative of what you get now with 7.10 and en_GB:


They wouldn't be. A majority of el_GR probably uses two-byte UTF-8
sequences, while en_GB would use a majority of single byte UTF-8
sequences (i.e. ASCII).

Good point. I provided in the other email figures from the same locale.
Halving the figures from "el" should give a very rough estimate.

# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq | awk
'{printf "%d+",$1}' > /tmp/bc_sums

Using "bc" with /tmp/bc_sums gives the figure
3.6M (3624412) for a standard session. This figure is a bit
conservative, because en_GB probably did more work than el.

With Ubuntu 8.04 (alpha6) and en_GB, the figure for the MO files is
less than 600K (585375).
Bastien, could you provide the proper figure for your system?

That is a saving of at least 3M in memory.


As Bastien explained, mmap() doesn't read the entire file into memory,
but only reads it as needed.

The stripping of "unneeded" messages is good, and should happen at the
package generation level (not in GNOME, or when creating tarballs).


Technically, I've opposed introducing this in intltool because of a
one incompatible difference:

  current gettext("Something") != such gettext("Something")

i.e. if "Something" was (un)translated as "Something" in the MO file,
gettext would return a static pointer with the string "Something".  If
it was untranslated, it would return the passed pointer.

That can and was used to detect whether there is a translation in some
programs (I've seen it done), so, until gains are proven to be big
enough to warrant breaking a few programs in strange ways, I wouldn't
do it on the packaging/build time.

I do not know whether GNOME applications do (or have the need to do)such a check.Can you give one example, in order to see why they need to find if thereis a translation file?

A valid concern I have seen (and this has to do with correctness) iswhen people manually configure the LANGUAGE variable, with somethinglike "es:fr:en". That is, pick the Spanish translation, if not availablefor a message pick French, else pick English. If the Spanish translationfor a specific message is the same with English, but not in French, thenthe user will see the French translation (she should have seen theSpanish-English translation instead).

Danilo also gave an example with Serbian, if a user chooses somethinglike "serbian_cyrillic:serbian_latin:en".

As far as I know, there is no UI tool (at least in GNOME) to set atriple LANGUAGE option.

For a general purpose system one may make the assumption that a singlelanguage is expected.

Of course, providing numbers to show what the gains are would help
make the decision.

Assuming my memory figures are correct (previous e-mail), I haveprovided file size and memory figures.


Simos

References:
- Re: Low memory hacks
  - From: Simos Xenitellis
- Re: Low memory hacks
  - From: Bastien Nocera
- Re: Low memory hacks
  - From: Simos Xenitellis
- Re: Low memory hacks
  - From: Danilo =?utf-8?Q?=C5=A0egan?=

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]