Re: Plans for gnome-vfs replacement

On Tue, 2006-09-19 at 14:54 +0200, Alexander Larsson wrote:

> Thinking more closely on this it seems that neither plain paths or URIs
> are really good enough. For instance, even local filenames need some
> sort of escaping for us to be able to display them, since they might
> contain binary data (everything but zero and slash is allowed), and we
> really want to display e.g. korean smb uris with readable text instead
> of crazy escape codes.
> I'll try to think some more on this. Hopefully we can come up with a
> good solution.

I've looked into this a bit more. Lets start by some examples of
filenames of various types:

Local filenames (in utf8 mode)
1) standard: /etc/passwd
2) utf8 and spaces: "/tmp/a åäö.txt" (encoding==utf8)
3) latin-1 and spaces: "/tmp/a åäö.txt" (encoding==iso8859-1)
4) filename without encoding: "/tmp/bad:\001\010\011\012\013" (as a C
5) mountpoint: /mnt/cdrom (cd has title "CD Title")

Ftp mount to
[where filenames are stored as utf8, this is detected by using
 ftp protocol extensions (there is an rfc) or by having the user
 specify the encoding at mount time]

6) normal dir: /pub/sources
7) valid utf8 name: /dir/a file öää.txt
8) latin-1 name: /dir/a file öää.txt

Ftp mount to (with filenames in latin-1)
9) latin-1 name: /dir/a file öää.txt

Backend that stores display name separate from real name. 
[Examples could be a flickr backend, a file backend that handles desktop
files, or a virtual location like computer:// (which is implemented
using virtual desktop files atm).]

10) /tmp/foo.desktop (with Name[en]="Display Name")

To complement this, here are the places where display filenames (i.e
utf-8 strings) are used in the desktop:

A) Absolute filename, for editing (nautilus text entry, file selector
B) Semi-Absolute filename, for display (nautilus window title)
C) Relative file name, for display (in nautilus/file selector icon/list
D) Relative file name, for editing (rename in nautilus)
E) Relative file name, for creating absolute name (filename completion
   for A). This needs to know the exact form of the parent (i.e. it
   differs for filename vs uri). I won't list this in the table below as
   its always the same as A from the last slash to the end.

Using the current gnome-vfs uri method, this is how the various names
would look:

   A                                                     B                             C                             D        
1) file:///etc/passwd                                    passwd                        passwd                        passwd   
2) file:///tmp/a%20%C3%B6%C3%A4%C3%A4.txt                a åäö.txt                     a åäö.txt                     a åäö.txt
3) file:///tmp/a%20%E5%E4%F6.txt                         a ???.txt                     a ???.txt (invalid unicode)   a ???.txt
4) file:///tmp/bad%3A%01%08%09%0A%0B                     bad:?????                     bad:????? (invalid unicode)   bad:?????
5) file:///mnt/cdrom                                     CD Title (cdrom)              CD Title (cdrom)              CD Title
6)                       sources on      sources                       sources
7)    a åäö.txt on    a åäö.txt                     a åäö.txt
8)             a ???.txt on    a ???.txt (invalid unicode)   a ???.txt
9)             a åäö.txt on    a åäö.txt                     a åäö.txt
10)file:///tmp/foo.desktop                               Display Name                  Display Name                  Display Name

The stuff in column A is pretty insane. It works fine as an identifier
for the computer to use, but nobody would want to have to type that in
or look at that all the time. That is why Nautilus also allows
entering some filenames as absolute unix pathnames, although not all
filenames can be specified this way. If this is used whenever possible
column A looks like this:

1) /etc/passwd
2) /tmp/a åäö.txt
3) file:///tmp/a%20%E5%E4%F6.txt
4) file:///tmp/bad%3A%01%08%09%0A%0B
5) /mnt/cdrom

As we see this helps for most normal local paths, but it becomes
problematic when the filenames are in the wrong encoding. For
non-local files it doesn't help at all. We still have to look at these
horrible escapes, even when we know the encoding of the filename.

The examples 7-9 in this version shows the problem with URIs. Suppose
we allowed an invalid URI like " åäö.txt"
(utf8-encoded string). Given the state inherent in the mountpoint we
know what encoding is used for the ftp server, so if someone types it
in we know which file they mean (either 7 or 9). However, suppose
someone pastes a URI like that into firefox, or mails it to someone, now
we can't reconstruct the real valid URI anymore. If you drag and drop it
however, the code can send the real valid uri so that firefox can load
it correctly.

So, this introduces two kinds of of URIs that are "mostly similar" but
breaks in many cases. This is very unfortunate, and imho not acceptable.
I think its ok to accept a URI typed in like 
" åäö.txt" and convert it to the right uri,
but its not right to then display such a uri in the nautilus location
bar, as that can result in that invalid uri getting into other places.

Since I dislike showing invalid URIs in the UI I think it makes sense
to create a new absolute pathname display and entry format. Ideally
such a system should allow any ascii or utf8 local filename to be
represented as itself. Furthermore it would allow input of URIs, but
immediately convert them to the display format (similar to how
inputing a file:// uri in nautilus displays as a normal filename).

One solution would be to use some other prefix than / for
non-local files, and to use some form of escaping only for non-utf8
chars and non-printables. This works since we only handle absolute
pathnames, so anything not starting with / is out of band. Here is an
example we could use:

1) /etc/passwd
2) /tmp/a åäö.txt
3) /tmp/a \xE5\xE4\xF6.txt
4) /tmp/bad:\x01\x08\x09\x0A\x0B
5) /mnt/cdrom
7) åäö.txt
8) \xE5\xE4\xF6.txt
9) åäö.txt

Under the hood this would use proper, valid escaped URIs. However, we
would display things in the UI that made some sense to users, only
falling back to escaping in the last possible case.

The API could look something like:

GFile *g_file_new_from_filename (char *filename);
GFile *g_file_new_from_uri (char *uri);
GFile *g_file_parse_display_name (char *display_name);

Another approach (mentioned by Jürg Billeter on irc yesterday) is to
move from a pure textual representation of the full uri to a more
structured UI. For example the part of the URI
could be converted to a single item in the entry looking like
[] (where # is an ftp icon). Then the rest of the entry
would edit just the path on the ftp server, as a local filename. The
disadvantage here is that its a bit harder to know how to type in a
full pathname including what method to use and what server (you'd type
in a URI). This isn't necessarily a huge problem if you rarely type in
remote URIs (you can follow links, browse the network, use favourites,

I don't know how hard this is to implement from a Gtk+ perspective
though. Its somewhat similar to what the evolution address entry does,
maybe the evolution hackers can give us some input here.

 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a Nobel prize-winning shark-wrestling hairdresser who hides his scarred 
face behind a mask. She's a brilliant winged socialite in the wrong place at 
the wrong time. They fight crime! 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]