Re: [gnome-cy] po file tools: msgconv

On Gwe, 2003-04-04 at 18:06, Kevin Donnelly wrote:
> So presumably the revised files I sent you earlier were lossy, then?  You mean 

Hard to tell because the originals were a little mangled

> OK - there is a process train currently of the form:
> (1) cvs down the relevant pot files

> (2) read each file through Kartouche to give a MySQL table

You want to turn it UTF8 first, or remember the string language in the

> (3) upload that to the Web and present it in a browser interface

And generate the right Character set header (right now you dont)

> (4) user inputs suggestions to the table via a browser

Fine. Make the form UTF8

> (5) user gets confirmation of suggestions added


> (6) on completion read each table through Kartouche to give a po file
> (7) run msgconv on the file to convert it to UTF-8

You need to keep it in UTF-8

> Are you saying that at points (2) and (6) the conversion will be lossy because 
> I personally am using an ISO-8859-1 charset on my PC and it should be UTF-8?  
> And that the solution to this would be to use the UTF-8 charset on my PC?

If you take y^ and turn it from UTF-8 to iso8859-1 you get "y" and if
you turn it back its still "y". In the Red Hat world since RH8
everything is UTF-8 based by default (en_GB.UTF8 etc)

> Also at (2), the table currently stores msgids and msgstrs in a text field, 
> but this can be changed to a BLOB format easily, which is the only way MySQL 
> can currently store UTF-8.  This would then ensure no loss in the db store.

UTF-8 is just a byte stream, there are no embedded \0 strings so mysql
may not get the comparisons right always but its ok storing it. See the
mysql manuals btw 

> At (3/4/5), does anything else need to be done to avoid people seeing 
> gibberish in a browser (eg sgr�n instead of sgr�).  Or perhaps worse, 
> putting in what looks OK to them (eg using Alt+numpad entries, or using a 
>font with w^) and having it returned as what looks like gibberish?  

Output Character Set of UTF-8 and the wbe browser will interpret it

> You'd mentioned in an earlier email that reporting a UTF-8 charset in the 
> browser headers should enable most browsers to render it OK, but will this 
> also apply to older Win PCs, which, as I understand it, were not UTF-8 
> compliant?

Yes. IE knowns about UTF8 even if the underlying OS doesn't. As does
netscape etc. You might end up seeing Ty not T^y that is all

> Presumably (7) can then still be used to convert any lingering 8859 encodings 
> in the file (eg input from a browser on a PC using the 8859 encoding) into 
> the proper UTF-8 ones. 


Basically the rule is

UTF8 -> anything 8bit is lossy
anything 8bit to UTF8 is not lossy

There is a whole seperate story about upper/lower case converting that
may bite you with other languages (notably Turkish) but are safe on 

gnome-cy mailing list
gnome-cy pengwyn linux org uk

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]