Re: Unescaping uris



Op woensdag 13-09-2006 om 09:57 uur [tijdzone +0200], schreef Alexander
Larsson:
> On Tue, 2006-09-12 at 23:28 +0200, Jesse van den Kieboom wrote:
> > Op dinsdag 12-09-2006 om 17:14 uur [tijdzone +0200], schreef Alexander
> > Larsson:
> > > > gnomevfs.unescape_string('file:///%E3%83%95.txt', '')
> > > > -> 'file:///%E3%83%95.txt'
> > > > 
> > > > gnomevfs.unescape_string('sftp:///%E3%83%95.txt', '')
> > > > -> 'sftp:///%E3%83%95.txt'
> > > 
> > > This looks like a bug in the python wrappers. The C functions return
> > > "file:///フ.txt" and "sftp:///フ.txt";.
> > 
> > Well, here it doesn't :( I tried it in C and it still doesn't properly
> > unescape the %XX sequences when I use gnome_vfs_format_uri_for_display.
> > I'm using ubuntu and have gnomevfs 2.16.0-0ubuntu1. Is this a bug in the
> > latest gnomevfs?
> 
> Well, gnome_vfs_format_uri_for_display doesn't in general unescape URIs.
> That is dangerous and can cause them to be misinterpreted when you try
> to parse them again.
> 
> > I think I understand the differences now, thanks for explaining them to
> > me. What I want to do is display file uris properly in gedit snippets
> > (substituting environmental variables like GEDIT_FILENAME and
> > GEDIT_BASENAME). I think that what I need to use is
> > format_uri_for_display, only problem is that at the moment it's is not
> > working properly for non file schemes.
> 
> Its not generally possible to unescape a URI and get a readable version
> of it. You have no idea what the encoding of remote filenames are, and
> even if you do you can't guarantee that the unescaped strings are valid
> in that encoding. The only thing that is guaranteed about URIs is that
> they are valid ascii (since non-ascii is escaped).

Yes, this is what we figured is the case, but we weren't sure.

> If you're guaranteed to never roundtrip the string (i.e. try to parse
> the resulting uri) and what to display the string i guess you could try
> unescaping and verifying its valid utf-8, which would help you on some
> uris. If you don't need to display the string (i.e. it doesn't have to
> be utf-8 or a known encoding) you can just unescape it. 

> You talk about things like FILENAME and BASENAME above, so you should
> probably use gnome_vfs_uri_extract_short_name() (unescapes, no guarantee
> of encoding), gnome_vfs_uri_extract_short_path_name() (returns escaped
> form, guaranteed to be ascii). 

Okay, thanks for all the explaining. We need to figure out whether we
just want to display pretty uri's for the user or give them a valid uri.
With the given explanations this shouldn't be a problem. Thanks!



-- 
Jesse van den Kieboom

Visit: http://www.icecrew.nl




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]