Re: [gthumb-list] FileData: the untold story



Dr. Michael J. Chudobiak wrote:
> Handling filenames correctly is surprisingly hard, and anyone who wants
> to hack on gThumb should read this to learn the "right way".
>
> The basic problems are that filenames can be stored in a unescaped form
> or an escaped form, the filesystem may or may not be using UTF8, and
> some functions support remote schemes like "sftp://host/foo/bar" and
> some expect "normal" paths like "/foo/bar".
>
> There are quite a few bugs in gThumb 2.10.x relating to these issues, so
> a major goal of 2.11.x is to fix them.
>
> Most file information in gThumb is accessed using the FileData struct,
> which has been expanded and improved in trunk. A new FileData is
> generated using:
>
> fd = file_data_new (path)
>
> The "path" argument is very forgiving - it will accept escaped or
> unescaped local paths and URIs.
>
> The FileData struct contains SIX different representations of the
> filename! These are:
>
>
> fd->path: This is the path argument passed to file_data_new, with
> "file://" prepended to paths with no scheme. This may or may not be in
> UTF8, and may or may not be escaped. Because of this uncertainty, IT
> SHOULD NOT BE USED in new code.
>
> fd->name: This is a pointer to the last part of fd->path (the filename,
> with no directory names). For the same reasons as above, IT SHOULD NOT
> BE USED in new code.
>
>
> fd->utf8_path: This is the full filename, guaranteed to be in UTF8
> format, and guaranteed to be understandable by gfile functions. Thus,
> this is is suitable for all display and most code use, except functions
> that only support local paths. This will always have a scheme.
>
> fd->utf8_name: This is a pointer to the last part of fd->utf8_path (the
> filename, with no directory names). This is is suitable for all display
> and most code use.
>
>
> fd->local_path: Most external libraries take a local path as an
> argument, and do not accept arguments with gfile/gnomevfs schemes (like
> "sftp://"). If the gvfs daemon is active, it creates a local mount point
> for these remote files under ~/.gvfs/. fd->local_path is what you should
> pass to these functions, after checking that this mount point exists*
> with file_data_has_local_path.
>
>
> fd->uri: This is a symlink-resolved escaped URI. The only real use for
> this is for generating the names for shared thumbnails. DO NOT USE for
> anything else, unless there is a very good reason.
>
>
> * If the gvfs daemon has not mounted the remote location, we normally
> abort the operation and provide an error message. In the future, we
> could modify file_data_has_local_path to add code to try to mount the
> remote point. This would be useful for situations where a remote URI has
> been bookmarked, for example.
>
>
> Other wrinkles: If you pass a filename to a shell command, you may still
> need to perform shell escaping.

If there a reason for the existence of all those different
representations, other than historical ones? I think it would be a lot
easier (thus less bugs) to keep only one representations, and generate
the others on-the-fly when necessary. Or is the caching of the
alternative representations necessary for performance?

I think the ones we should keep are the fd->utf8_path and fd->local_path
fields. Or maybe even better is to use GFile's everywhere? That would
have the benefit that we could pass it directly to all GIO functions.

Note that the filename encoding issue is quite complicated. Filenames as
stored in the filesystem are just byte strings (with or without a
particular charset encoding), while for display we always need UTF-8
string. But the conversion is not always lossless, so you need to keep
both of them.

This is definitely worth reading:

http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html#file-name-encodings
http://library.gnome.org/devel/gio/stable/GFile.html#GFile.description



check out the rest of the Windows Live™. More than mail–Windows Live™ goes way beyond your inbox. More than messages


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]