Re: How many words does GNOME 1.2 has?
- From: Michael Twomey <michael twomey ireland sun com>
- To: Karl Eichwalder <keichwa gmx net>
- Cc: Jordi Mas <jmas softcatala org>, gnome-i18n gnome org
- Subject: Re: How many words does GNOME 1.2 has?
- Date: Tue, 19 Sep 2000 11:49:11 +0100
Karl Eichwalder wrote:
> Nice -- but it will not catch all strings; try
>
> echo '"one \"xxx\" three"' | \
> grep -v '#' $file | awk -F'"' '{print $2}' | wc -w
>
> or
>
> echo '"one \"xxx #\" three"' | \
> grep -v '#' $file | awk -F'"' '{print $2}' | wc -w
>
> I'd say my `grep' line does the job more reliable.
>
Good point! Thats the problem with one liners. However the reason I
didn't use your grep is that on Solaris the behaviour of grep and sed is
a bit different (mainly in the regular expressions) so I get msgid and
msgstr in the word counts.
A quick fix for my one liner:
echo '"one \"xxx #\" three"' | \
grep -v '#' $file | sed 's/\\"//g' | awk -F'"' '{print $2}' | wc -w
The sed strips out the escaped quotes (these shouldn't be counted
anyway).
Doing this gets the word count up to about 112000 (about 1000 words
inside quotes).
> Yes, that's why I voted to use the POT files; by definition these file
> are pristine message string files without any translation (msgstr).
>
I agree. It is very important that the files used for translation are in
pristine condition. There is nothing worse than having to fix message
files before translating them.
> > I use this myself to get word counts. (In case you are interested from
> > the ftp://ftp.gnome.org/pub/GNOME/i18n/gnome-i18n-files.tar.gz tarball
> > there are about 111000 words in gnome .po messages alone).
>
> It's a good hint to recommend to use the tarball! I nver thought that
> there are so many words :) (but not all strings are unique).
>
Another good point. Doing this sorts out all the none unique lines:
for file in `ls *.po`
do
echo -n $file; grep -v '#' $file | sed 's/\\"//g' \
| awk -F'"' '{print $2}' >> all_messages.txt
done
then
sort -u all_messages.txt | wc -w
This produces a count of about 100000 (12000 repeated words).
thanks for the good points,
Michael
--
Michael Twomey
Sun Microsystems
Dublin, 8199164, x19164
"Fly my little Makefiles! Fly!"
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]