Source code encoding detection modifications for intltool



Hi, here you have a patch that tries to detect the file encoding used
and then, calls xgettext with that encoding option.

Please, test it and review it I have no idea about perl so my change
it's mainly a cut & paste fix

I will also prepare a gettext >= 0.12 check so new intltool releases
will require it to be able to deal with UTF-8 source files.

Some comments. I get the file encoding with the "file" command and thus,
I'm not able to know  the XML and yacc files encoding, so I'm assuming
that XML files are UTF-8 and any other file that it's not UTF-8, ISO* or
XML is ASCII

Comments, ideas?

I will apply this patch to my intltool installation so the status pages
will start to use it.

Cheers.

P.S.: This patch is an evolution from this
http://deaddog.org/~maddog/intltool-update.patch
-- 
Carlos Perelló Marín
Debian GNU/Linux Sid (PowerPC)
Linux Registered User #121232
mailto:carlos pemas net || mailto:carlos gnome org
http://carlos.pemas.net
Valencia - Spain
? .intltool-update.in.in.swp
Index: intltool-update.in.in
===================================================================
RCS file: /cvs/gnome/intltool/intltool-update.in.in,v
retrieving revision 1.85
diff -u -w -r1.85 intltool-update.in.in
--- intltool-update.in.in	25 May 2003 23:06:05 -0000	1.85
+++ intltool-update.in.in	14 Jul 2003 18:26:45 -0000
@@ -55,6 +55,7 @@
 my $conf_file; # remove later
 my %varhash = ();
 my %po_files_by_lang = ();
+my $encoding = "ASCII";
 
 # Regular expressions to categorize file types.
 # FIXME: Please check if the following is correct
@@ -251,6 +252,38 @@
    return "gettext\/$gettext_type";
 }
 
+sub determine_code ($) 
+{
+   my $comments = $_;
+   my $code = $_;
+   my $filename;
+   my $type;
+   my $gettext_code="ASCII"; # All files are ASCII by default
+
+   if ($comments =~ /^[^#]/)
+   {
+       $code =~ s/^\[[^\[].*]\s*//;
+       $filename = "../$code";
+       $type=`file $filename | cut -d ' ' -f 2`;
+       if ($? eq "0")
+       {
+           if (!($type =~ /^[^ISO]/) or !($type =~ /^[^UTF]/))
+           {
+               $gettext_code=$type;
+	       chomp $gettext_code;
+	   }
+	   elsif (!($type =~ /^[^XML]/))
+	   {
+	       $gettext_code="UTF-8"; # We asume that .glade and other .xml files are UTF-8
+	   }
+       }
+
+   }
+   
+   return $gettext_code;
+}
+
+
 sub find_leftout_files
 {
     my (@buf_i18n_plain,
@@ -517,14 +550,31 @@
     while (<INFILE>) 
     {
         chomp;
+
+	my $gettext_code;
+	
         if (/\.($xml_extension|$ini_extension)$/ || /^\[/) 
 	{
             s/^\[.*]\s*//;
 	    print OUTFILE "$_.h\n";
+	    $gettext_code = &determine_code ("$_.h");
         } 
         else 
         {
             print OUTFILE "$_\n";
+	    $gettext_code = &determine_code ("$_");
+        }
+
+	if ($gettext_code ne "" and $encoding ne $gettext_code)
+	{
+	    if ($encoding eq "ASCII")
+	    {
+	        $encoding=$gettext_code;
+	    }
+	    elsif ($gettext_code ne "ASCII")
+	    {
+	        print "Warning: You should use the same file encoding for all your project files, but you have $encoding and $gettext_code files ($_).\n";
+	    }
         }
     }
     
@@ -534,6 +584,7 @@
     system ("xgettext", "--default-domain\=$MODULE", 
 			"--directory\=\.\.",
 	   		"--add-comments", 
+			"--from-code\=$encoding", 
 			"--keyword\=\_", 
 			"--keyword\=N\_", 
 			"--keyword\=U\_",
@@ -675,7 +726,7 @@
 	}
     }
     
-    if ($str =~ /^(.*)\${?([A-Z]+)}?(.*)$/)
+    if ($str =~ /^(.*)\${?([A-Z_]+)}?(.*)$/)
     {
 	my $rest = $3;
 	my $untouched = $1;

Attachment: signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]