Re: [gnome-cy] po file tools: msgconv



On Friday 04 April 2003 12:26 pm, Alan Cox wrote:
> That conversion is lossy. For non original C locale souce you cannot do
> this. You must retain full encoding properties (ie UTF-8 is about the
> only choice). At the point you meet a non C or 8859-1 encoded file you
> can only safely convert it to/from unicode space, not into iso 8bit
> encodings.
>
> For the Gnome case UTF-8 is mandatory. GNOME doesn't support not utf-8
> encoded files.

So presumably the revised files I sent you earlier were lossy, then?  You mean 
this in terms of "although they were in UTF-8 format when I got them, prior 
to that they went through some hoops which may have thrown away UTF-8 type 
info which is now non-recoverable"?

OK - there is a process train currently of the form:
(1) cvs down the relevant pot files
(2) read each file through Kartouche to give a MySQL table
(3) upload that to the Web and present it in a browser interface
(4) user inputs suggestions to the table via a browser
(5) user gets confirmation of suggestions added
(6) on completion read each table through Kartouche to give a po file
(7) run msgconv on the file to convert it to UTF-8

Are you saying that at points (2) and (6) the conversion will be lossy because 
I personally am using an ISO-8859-1 charset on my PC and it should be UTF-8?  
And that the solution to this would be to use the UTF-8 charset on my PC?

Also at (2), the table currently stores msgids and msgstrs in a text field, 
but this can be changed to a BLOB format easily, which is the only way MySQL 
can currently store UTF-8.  This would then ensure no loss in the db store.

At (3/4/5), does anything else need to be done to avoid people seeing 
gibberish in a browser (eg sgrÃ?n instead of sgrîn?).  Or perhaps worse, 
putting in what looks OK to them (eg using Alt+numpad entries, or using a 
font with w^) and having it returned as what looks like gibberish?  

You'd mentioned in an earlier email that reporting a UTF-8 charset in the 
browser headers should enable most browsers to render it OK, but will this 
also apply to older Win PCs, which, as I understand it, were not UTF-8 
compliant?

Presumably (7) can then still be used to convert any lingering 8859 encodings 
in the file (eg input from a browser on a PC using the 8859 encoding) into 
the proper UTF-8 ones. 

There are no answers - only questions :-)  But it would be nice to be able to 
get this definitively sorted before I go any farther.

Best wishes

Kevin





_______________________________________________
gnome-cy mailing list
gnome-cy pengwyn linux org uk
http://pengwyn.linux.org.uk/mailman/listinfo/gnome-cy



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]