Re: [gnome-cy] po file tools: msgconv

On Llu, 2003-04-07 at 13:19, Kevin Donnelly wrote:
> On Friday 04 April 2003 3:51 pm, Alan Cox wrote:
> >> So presumably the revised files I sent you earlier were lossy, then?  
> > Hard to tell because the originals were a little mangled
> Anything specific, mayhap, perchance?

They contained a mix of iso8859-14, utf-8 and things like ^w to indicate
w (which is fine and I suggested to Chris to do that if it was a
problem). The cleaned up ones are UTF-8

> > > (2) read each file through Kartouche to give a MySQL table
> > You want to turn it UTF8 first, or remember the string language in the
> > table
> But if I set my PC (and Kartouche - see below) to UTF-8 this should happen 
> automatically, no?  That is, the encoding will always be in UTF-8, so there 
> will be no danger of putative infoloss during the process?  

If you just copy bytes as is you don't change the format. UTF-8 is
defined so that ascii is unmodified and neither NUL nor / occur in
multibyte characters.

> Setting SuSE8.1 to UTF-8:
> - install package glibc-i18ndata (not installed by default) to get locales and 
> charmaps;
> - (as root) localedef -i cy.GB -f UTF-8 cy_GB.utf8 (stored in /usr/lib/locale)
> - (as root) pico /etc/sysconfig/language; amend RC_LANG="cy_GB.utf8"; run 
> SuSEconfig; reboot for good measure
> - locale charmap gives UTF-8


> BUT - major woe!  Under 8859-1, I could use Sht+R.Ctrl to compose �om 
> sequential o and ^.  Under UTF-8 I can't.  Is there something I'm missing 
> that needs to be done to re-enable this?  /usr/X11/lib/X11/locale/en_US.UTF-8 
> refers to a deadkey.

Under utf-8 you should be getting �st fine (I'm doing
shift-right-alt, ^o). May depend on your keyboard mappings as to what
the compose key is

> Various strings in the files now display oddly, so I'll change those as I meet 
> them, unless I come up with a cunning plan.  

Some were displaying oddly before

> >From here (AJ Flavell's very interesting site):
> and here:
> I read that "the browser will return the data in the encoding of the original 
> form".  So if I have sent the UTF-8 header, there should be nothing more to 
> do - right?  This certainly seems to be the case - entering data into the 
> UTF-8 page via an 8859-1 Linux and then Windows PC kept the circumflexes OK, 
> and didn't turn the character into a ? or worse.  But is this just an 
> illusion of success?

Cool. I didn't know it was so simple with forms 8)

> > > (7) run msgconv on the file to convert it to UTF-8
> > You need to keep it in UTF-8
> Above refers - if PC is set to UTF-8 the issue should not arise, no?


> So you're saying that a BLOB field is *not* necessary?  That would certainly 
> have other benefits.  Experiments on Win and Lin (with 8859-1 encoding) 
> putting circumflexed vowels into the db do seem to show no difference 
> whatever the field type.  But again, is this just an illusion of success?

BLOB is not needed. The unicode people wanted to be sure that as far as
possible dumb non UTF-8 aware apps did roughly the right thing with

> OK - it does make sense to keep the workflow in one format all the way 
> through.


> With your much greater knowledge of this area, do you think the revised system 
> should cover all bases?  (In theory, I mean - obviously from a practical 
> viewpoint the state of the actual output files may determine the need for 
> further work.)

Sounds right to me

> > There is a whole seperate story about upper/lower case converting that
> > may bite you with other languages (notably Turkish) but are safe on
> > Welsh/English
> Welsh and English are enough for the moment, thanks :-)  I will, however, wish 
> to come back to a detailed discussion of Turkish some time in 2006 ....

Grin.. its just in some languages toupper(tolower(x))== x is not always true 8)

gnome-cy mailing list
gnome-cy pengwyn linux org uk

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]