Re: I propose that GnomeVFS should support a variety of character set encoding for access to remote filesystem.



Hi,

Am Sonntag, den 19.02.2006, 18:12 +0900 schrieb Hyunsik Choi:
> Hi GnomeVFS Team,
> 
> Gnome uses basically UTF8 as a character set encoding 

... to support multiple languages at once on one computer in one
session, without encoding nightmares for programmers...  (it is really
the only sane way to go)

> because of it's
> future-oriented characteristic, but many people in countries that use
> 2-byte language still use other character set encodings. 

That's unfortunate.

However, even microsoft windows did away with them and uses unicode
throughout the operating system since windows 2000, or did I miss
something?

So what does still use the other character set encodings? A
representative list would be nice :)

> Although Gnome
> has higher i18n and l10n qualities, it is not enough to support other
> character set encoding except UTF8. 

Note what Unicode strives to avoid: having tons of encodings, where
equal data means one thing in one encoding and another thing in another
encoding. It just goes downhill from there (guessing the encoding using
chicken bones, ...).

So _internally_, it should _only_ one universal encoding. (I know you
are not suggesting that it shouldn't, but just to keep it in mind)

That doesn't rule out transferring some oddball encoding to utf-8, but I
think there have been problems with that in the past (not sure, but it
had something to do with losing information in the conversion).

Note also that there is no programming language (that I know of) which
stores the encoding alongside the string, so it would be a hell of a lot
of work to pass the encoding around, even _if_ it were a good idea
(which it isn't).

> In especial, this problem is that
> nautilus show broken filename when I try to access some remote
> filesystems, such as ftp and sftp, 
> that use other encoding via nautilus.

The ftp protocol is traditionally broken in that it just doesn't _have_
a way to find out the encoding (i.e. it doesn't transmit the encoding
name, which is the only safe way). 

Also, ftp uris don't have the encoding name in it either, hence if they
were anything other than utf-8 encoded (also: is that mandated by
standard? not sure), drag+drop wouldn't work anymore, ....

> For example, I'm korean and use UTF8 as default encoding. However, most
> servers that provide ftp or sftp service use EUC-KR as encoding. 

I wonder on how other (usable) ftp clients handle this case (e.g. on
windows / mac) ?
Perhaps there is some kind of extension used on the ftp server side to
signify the encoding somehow ? 

Try to find a way to tell the encoding by looking around on the affected
ftp server. If there is something that can be safely used to determine
it, that would be nice.

> For
> this reason, I have many experiences that nautilus showed a series of
> strange strings as filenames. This problem has been chronic since Gnome
> used UTF8 as default character set encoding. 

It isn't that bad for mounted filesystems since the kernel transfers all
the filenames to utf-8 if asked to. But I agree that remote filesystems
are a much harder nut to crack...

> Consequently, I strongly
> propose that GnomeVFS team make an issue of this problem. I expect that
> if you do that, many gnome hacker in 2-byte language countries will
> suggest a variety of good ideas.

Well, good ideas are needed, yes :) 

They should not involve chicken bones (wild guesses) or the user having
to know strange incantations (i.e. encoding names).

Preferred would be the korean ftp servers already using some kind of
scheme to signify that they use 1) EUC-KR (ksc5601, ks_c_5601-1987, is
there a difference ?) or 2) UTF-8, and gnome-vfs just adapting to that.
Is that possible ? [yeah, right. as if it were that easy ... but I'm
asking just in case]

> 
> Cheers,
> Hyunsik Choi

cheers,
  Danny





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]