Re: Unescaping uris



Hi Jesse

A URI such as "file:///フ.txt" is not a valid URI. It is a valid IRI
(Internationalized Resource Identifier), but supporting those opens up a
whole new can of worms (think "IDN"). For user convenience, we "display"
file URIs just like local paths, in which URI percent-encoding is not
appropriate.

Use format_uri_for_display.

On Tue, 2006-09-12 at 11:47 +0200, Jesse van den Kieboom wrote:
> Hi,
> 
> gedit recently received a bug
> (http://bugzilla.gnome.org/show_bug.cgi?id=355477) about multibyte
> characters not being display properly.
> 
> I've been looking into the problem and I've encountered some things
> about unescaping that I don't really understand. As I understand there
> are functions to unescape uris and functions to format uris for display
> that all do approximately the same, but differ in a way I don't fully
> understand.
> 
> gnome_vfs_unescape_string
> gnome_vfs_unescape_string_for_display
> gnome_vfs_format_uri_for_display
> 
> What I'd like to know is what to use when, because they differ in
> behavior (especially for non file:// schemes). Some examples (python)
> with a file called フ.txt:
> 
> gnomevfs.unescape_string('file:///%E3%83%95.txt', '')
> -> 'file:///%E3%83%95.txt'
> 
> gnomevfs.unescape_string('sftp:///%E3%83%95.txt', '')
> -> 'sftp:///%E3%83%95.txt'
> 
> So, what does unescape_string actually do? I read that I shouldn't use
> it on full uri's. Okay, so what should I use on full uri's? Over to the
> display functions:
> 
> gnomevfs.unescape_string_for_display('file:///%E3%83%95.txt')
> -> 'file:///\xe3\x83\x95.txt'
> 
> gnomevfs.unescape_string_for_display('sftp:///%E3%83%95.txt')
> -> 'sftp:///\xe3\x83\x95.txt'
> 
> Okay, so this one actually does what's expected, but what are these
> functions for in relation to format_uri_for_display:
> 
> gnomevfs.format_uri_for_display('file:///%E3%83%95.txt')
> -> '/\xe3\x83\x95.txt'
> 
> gnomevfs.format_uri_for_display('sftp:///%E3%83%95.txt')
> -> 'sftp:///%E3%83%95.txt'
> 
> 
> In short, unescape_string_for_display seems to do about the same as
> format_uri_for_display, but format_uri_for_display removes the file
> scheme (which is what we are looking for in gedit). But, using
> format_uri_for_display on remote file schemes does not properly unescape
> the uri (which unescape_string_for_display does correctly). Is this a
> bug? If not, what's the rationale for not unescaping remote uris in
> format_uri_for_display. Should we use format_uri_for_display for local
> files and unescape_string_for_display on remote files? 
> 
> What I'd like to know is what the differences between the functions are
> and when to use what.
> 
> With kind regards,
> 
> 
-- 
Alex Jones <alex weej com>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]