Re: [gnome-cy] po file tools: msgconv
- From: Alan Cox <alan lxorguk ukuu org uk>
- To: kevin dotmon com
- Cc: Gnome Welsh List <gnome-cy www linux org uk>
- Subject: Re: [gnome-cy] po file tools: msgconv
- Date: 04 Apr 2003 16:51:12 +0100
On Gwe, 2003-04-04 at 18:06, Kevin Donnelly wrote:
> So presumably the revised files I sent you earlier were lossy, then? You mean
Hard to tell because the originals were a little mangled
> OK - there is a process train currently of the form:
> (1) cvs down the relevant pot files
Fine
> (2) read each file through Kartouche to give a MySQL table
You want to turn it UTF8 first, or remember the string language in the
table
> (3) upload that to the Web and present it in a browser interface
And generate the right Character set header (right now you dont)
> (4) user inputs suggestions to the table via a browser
Fine. Make the form UTF8
> (5) user gets confirmation of suggestions added
Ok
> (6) on completion read each table through Kartouche to give a po file
> (7) run msgconv on the file to convert it to UTF-8
You need to keep it in UTF-8
> Are you saying that at points (2) and (6) the conversion will be lossy because
> I personally am using an ISO-8859-1 charset on my PC and it should be UTF-8?
> And that the solution to this would be to use the UTF-8 charset on my PC?
If you take y^ and turn it from UTF-8 to iso8859-1 you get "y" and if
you turn it back its still "y". In the Red Hat world since RH8
everything is UTF-8 based by default (en_GB.UTF8 etc)
> Also at (2), the table currently stores msgids and msgstrs in a text field,
> but this can be changed to a BLOB format easily, which is the only way MySQL
> can currently store UTF-8. This would then ensure no loss in the db store.
UTF-8 is just a byte stream, there are no embedded \0 strings so mysql
may not get the comparisons right always but its ok storing it. See the
mysql manuals btw
> At (3/4/5), does anything else need to be done to avoid people seeing
> gibberish in a browser (eg sgr�n instead of sgr�). Or perhaps worse,
> putting in what looks OK to them (eg using Alt+numpad entries, or using a
>font with w^) and having it returned as what looks like gibberish?
Output Character Set of UTF-8 and the wbe browser will interpret it
right
> You'd mentioned in an earlier email that reporting a UTF-8 charset in the
> browser headers should enable most browsers to render it OK, but will this
> also apply to older Win PCs, which, as I understand it, were not UTF-8
> compliant?
Yes. IE knowns about UTF8 even if the underlying OS doesn't. As does
netscape etc. You might end up seeing Ty not T^y that is all
> Presumably (7) can then still be used to convert any lingering 8859 encodings
> in the file (eg input from a browser on a PC using the 8859 encoding) into
> the proper UTF-8 ones.
Yes
Basically the rule is
UTF8 -> anything 8bit is lossy
anything 8bit to UTF8 is not lossy
There is a whole seperate story about upper/lower case converting that
may bite you with other languages (notably Turkish) but are safe on
Welsh/English
_______________________________________________
gnome-cy mailing list
gnome-cy pengwyn linux org uk
http://pengwyn.linux.org.uk/mailman/listinfo/gnome-cy
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]