ChangeLog encodings (was Re: The first commit on behalf of theIcelandic team)



fre 2003-01-24 klockan 09.30 skrev Sammi:
> Thanks for all previous help. 
> 
> I was just wondering how I can check if a file is in UTF-8 format.

You can often use the "file" utility for that. "file" tries to do an
educated guess on the file's type and also its encoding:

    file filename

Example:	file ChangeLog
Example output:	ChangeLog: UTF-8 Unicode English text


> This is how i've been converting to UTF-8
> msgconv -t UTF-8 is.po > is.po.new && mv is.po.new is.po 
> 
> Wen I tried to convert the ChangeLogs the same way to UTF-8 format
> but I got errors.

Yes, "msgconv" only works on po files (since it uses the encoding
information mentioned in the po header as the from-encoding). If you
want to convert other types of files, you should probably use the more
general utility "iconv":

    iconv -f fromencoding -t toencoding filename

Example: iconv -f iso-8859-1 -t UTF-8 myfile.txt > myfile.txt.new &&
         mv myfile.txt.new myfile.txt

The problem with using this on ChangeLogs is that the various
translators who have written in the ChangeLog might all have done so
with their own encoding, so that the file currently is a mix of
different encodings (a horrible mess in other words). Thus, it's not
trivial to decide on a from-encoding in the above command... in the end,
at least some translator's name will probably end up looking garbled,
even after the conversion to UTF-8, since the file was mixed-encoding to
begin with.

I really don't know what's the best solution here. Perhaps someone else
has an idea?


Christian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]