po files: invalid multibyte sequence

From: "Fernando Apesteguía" <fernando apesteguia gmail com>
To: gnome-i18n gnome org
Subject: po files: invalid multibyte sequence
Date: Mon, 12 Jun 2006 19:07:44 +0200

Hi Christian (and others)

The file's header looks as folows:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2006-05-09 23:08+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL ADDRESS>\n"
"Language-Team: LANGUAGE <LL li org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

When I run the msgfmt command that you suggested I get for the three po files that I have:

file.po: the file's header is fuzzy.
fatal error 1 encountered
(This is a translation about the error, since the original sentences were in spanish)

For the croatian file, in addition, I get the famous "invalid multibyte sequence" message, but not for the others. I have a es.gmo, eu.gmo but not hr.gmo.

I'm a noob in this dealings and I have no idea about the right direction to fix this problem.

About how my translators send the .po file... they use a normal mail system (usually gmail) and send the .po file as is, without packaging or compression.

More ideas?

Best regards

---------- Forwarded message ----------
From: Christian Rose <menthos gnome org>
Date: Jun 12, 2006 12:43 AM
Subject: Re: po files: invalid multibyte sequence
To: Fernando Apesteguía <fernando apesteguia gmail com>
Cc: gnome-i18n gnome org

On 6/11/06, Fernando Apesteguía <fernando apesteguia gmail com> wrote:
> I write to explain my problem with some translation files. I'm using Anjuta,
> when some of my project translators send me the translation files, I added
> them to the project, but when I try to generate the distribution I get these
> errors:
>
> "invalid multibyte sequence"

Most probably this problem is because somewhere along the way
something converted the po files from their original UTF-8 encoding to
some other legacy encoding (like latin1). As the files' headers still
specify that the files are in UTF-8 encoding, the gettext tools
rightfully complain that the files, although being specified as being
valid UTF-8, contain invalid UTF-8 characters.

You did not tell us *how* your translators send you the files; if they
send them to you by mail, it could be that some mailer software along
the way munged the encoding. A workaround is to send the files
compressed, as that will preserve the encoding, since they are then
treated as binary files and are hopefully left as-is by mailer
software.

I do not know if Anjuta supports working with UTF-8 files; if that's
not the case, that may very well be the problem.

> I noticed that some characters are odd for my spanish configuration. Here
> reaches my question:
>
> How to test the different translation files? Should I install in my
> distribution every language that I wanted to test? Is there other way?
> If I don't fix this, I can't generate the distribution package

I'm not sure I understand your question.
If you want to test if the po files are syntactically valid, a simple
"msgfmt -cv file.po" should do.

Christian

References:
- po files: invalid multibyte sequence
  - From: =?ISO-8859-1?Q?Fernando_Apestegu=EDa?=
- Re: po files: invalid multibyte sequence
  - From: Christian Rose

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]