Re: How many words does GNOME 1.2 has?



Karl Eichwalder wrote:
> Nice -- but it will not catch all strings; try
> 
>     echo '"one \"xxx\" three"' | \
>     grep -v '#' $file | awk -F'"' '{print $2}' | wc -w
> 
> or
> 
>     echo '"one \"xxx #\" three"' | \
>     grep -v '#' $file | awk -F'"' '{print $2}' | wc -w
> 
> I'd say my `grep' line does the job more reliable.
> 

Good point! Thats the problem with one liners. However the reason I
didn't use your grep is that on Solaris the behaviour of grep and sed is
a bit different (mainly in the regular expressions) so I get msgid and
msgstr in the word counts.

A quick fix for my one liner:
echo '"one \"xxx #\" three"' | \
grep -v '#' $file | sed 's/\\"//g' | awk -F'"' '{print $2}' | wc -w

The sed strips out the escaped quotes (these shouldn't be counted
anyway).

Doing this gets the word count up to about 112000 (about 1000 words
inside quotes).

> Yes, that's why I voted to use the POT files; by definition these file
> are pristine message string files without any translation (msgstr).
> 

I agree. It is very important that the files used for translation are in
pristine condition. There is nothing worse than having to fix message
files before translating them.

> > I use this myself to get word counts. (In case you are interested from
> > the ftp://ftp.gnome.org/pub/GNOME/i18n/gnome-i18n-files.tar.gz tarball
> > there are about 111000 words in gnome .po messages alone).
> 
> It's a good hint to recommend to use the tarball!  I nver thought that
> there are so many words :)  (but not all strings are unique).
> 

Another good point. Doing this sorts out all the none unique lines:

for file in `ls *.po`
do
echo -n $file; grep -v '#' $file | sed 's/\\"//g' \
| awk -F'"' '{print $2}' >> all_messages.txt
done

then

sort -u all_messages.txt | wc -w

This produces a count of about 100000 (12000 repeated words).


thanks for the good points,
	Michael

-- 
Michael Twomey
Sun Microsystems
Dublin, 8199164, x19164
"Fly my little Makefiles! Fly!"




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]