Re: lookup.jl google and koi8-r



On Sun, Jun 25, 2006 at 02:41:36PM +0300, Vladimir Zolotykh wrote:

> On Sun, 25 Jun 2006 08:33:08 +0100
> Ewan Mellor <sawfish ewanmellor org uk> wrote:
> 
> > On Sat, Jun 24, 2006 at 12:27:00PM +0300, Vladimir Zolotykh wrote:
> > 
> [snip]
> > > in sawfish-client and type the same koi8-r word as before I get
> > > 
> > >   user> (prompt-for-string "FOO:")
> > >   "\301\302\327\307\304\305\326"
> > >   user> 
> > > 
> > > What arrangements must be made to pass it properly to Google, causing
> > > Google to interpret it as koi8-r word, not as don't-now-what ?
> > 
> > Which koi8 word did you type here?  I don't know, but I'd guess that if those
> > aren't koi8 bytes then they are the UTF-8 equivalents instead.  I'm sure that
> > with a little help from someone who uses non-latin encodings we could get the
> > transcoding sorted out.
> It was first seven lower-case letters of the Russian alphabet, e.g.
> 
>   ???????
> 
> Their KOI8-r octal codes are
> 
>   301
>   302
>   327
>   307
>   304
>   305
>   326
> 
> Their UTF-8 codes (hex) are
> 
>   430
>   431
>   432
>   433
>   434
>   435
>   436
> 
> I'd say that Mozilla (the browser I'm using, version mozilla-firefox
> 1.0.4-2sarge7) expects somehow UTF-8 letters but gets KOI8-r instead,
> which he thinks are UTF-8 coded, hence the confusion. In my opinion,
> either PROMPT-FOR-STRING should convert what it reads to UTF-8, or
> explicit conversion from KOI8-r to UTF-8 should be done before passing
> the string to Mozilla. However, I'm not that familiar with LIBREP to
> fix that myself.

Yes, what is happening is prompt-for-string is just passing the bytes that you
give it, so it's returning the KOI8-r bytes.  This is then going to
url-escape-query in my url.jl module, which is turning it into %c1 etc for
sending to Google as the query string for the URL.  This is obviously then
being misinterpreted.

I presume that the URL spec somewhere says that the characters must be
converted to UTF-8.  I can't find a reference for that, but that's what my
browser (Opera) seems to do.  For that to be done properly in the url module,
it would need to discover the character encoding being used for Sawfish
(presumably the one set by your locale) and then convert to UTF-8 correctly.

Someone some time ago posted some patches for making Sawfish work with
Japanese (IIRC).  Perhaps you could dig those out of the archives and see what
the design for that work was.

> I've got another question about using lookup.jl (and Sawfish or
> Librep?). I'm using two keyboard layouts, or groups in XKB
> terminology: US/ASCII and Russian. Locale as you already know is
> ru_RU.KOI8-R. Groups (or keyboard layouts) are changed on per window
> basis, e.g. each window has its own group, either US/ASCII or Russian.
> I'm concerned with the popup window produced by
> PROMPT-FOR-STRING. Does it have a name (or a class)?

Well, I get this:

ewan $ xwininfo 

xwininfo: Please select the window about which you
          would like information by clicking the
          mouse in that window.

xwininfo: Window id: 0x802696 (has no name)

so it looks like you are out of luck with a name, though I guess you might be
able to give it one (modifying Sawfish appropriately, of course).

The X input stuff is all voodoo to me I'm afraid, so I can't help there.  Try
digging in the archives again.

Ewan.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]