Re: character set in URIs for drag and drop



On Fri, Aug 24, 2001 at 01:06:55PM -0700, Darin Adler wrote:
> When dealing with drag and drop of files, we often deal with URIs for 
> files on the local system. Currently, the URIs typically use the raw name 
> from the file system with URI encoding. For example, if a file has an 
> upside-down question mark in its name, and my file system has file names 
> encoded with the Latin-1 character set, then the file might have this URI:
> 
>      file:///home/darin/%BFQue%3F
> 
> But I think that this URI would not be what GNOME 2 programs would expect.
>   They would instead expect the URI to be encoded with UTF-8:

  That time it's not in RFC 2396, too old, maybe it has bee superseeded
  http://www.faqs.org/rfcs/rfc2396.html
  2.1 URI and non-ASCII characters

  "For original character sequences that contain non-ASCII characters,
   however, the situation is more difficult. Internet protocols that
   transmit octet sequences intended to represent character sequences
   are expected to provide some way of identifying the charset used, if
   there might be more than one [RFC2277].  However, there is currently
   no provision within the generic URI syntax to accomplish this
   identification. An individual URI scheme may require a single
   charset, define a default charset, or provide a way to indicate the
   charset used.

   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification."

  However each time I have discussed with people from the I18N
group at W3C I was told that the URI should first be converted to
UTF8, then the normalization would ocur. There is some recent
prose on this issue in the XPointer specification:

   http://www.w3.org/TR/xptr#uri-escaping

---------------------
   1.  Each disallowed character is converted to UTF-8 [IETF RFC 2279]
       as one or more bytes.
   2.  Any bytes corresponding to a disallowed character are escaped
       with the URI escaping mechanism (that is, converted to %HH,
       where HH is the hexadecimal notation of the byte value).
   3.  The original character is replaced by the resulting character sequence.
---------------------

  This part had really a lot of review, I would trust it.

Daniel


-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]