Re: Using "user.mime_type" xattr for MIME guessing



On Thu, Aug 9, 2018 at 10:47 AM Bastien Nocera <hadess hadess net> wrote:
On Thu, 2018-08-09 at 10:35 -0400, Colin Atkinson via gtk-devel-list
wrote:
> Hi everyone,
> I'm working on a FUSE file system which makes network requests
> whenever a file is read. So obviously, I would like to avoid excess
> read requests to files.
>
> The current implementation [0] of gio's MIME guessing seems to check
> the file extension, and then immediately fall back on reading magic
> bytes when this is not possible (i.e. files with an ambiguous or no
> extension). In my situation, this can potentially lead to many
> network requests any time a user opens a directory in Nautilus or a
> file selection dialog.
>
> According to the FreeDesktop specs [1], implementations may query the
> user.mime_type xattr for a given file's MIME. But the current version
> of glib seems not to make use of this.
>
> Would there be any interest in a patch to add this functionality? If
> so, I would be happy to work on it.
>
> Please let me know if there's anything I've missed/misunderstood.

It's probably something interesting to add to GIO, though checking
xattrs also has a cost, especially on local disks.

Depending on what your FUSE is, you might want to consider writing a
gvfs backend instead, where the backend is responsible for providing
the mime-type/content-type (and all the other metadata), so you can use
whichever method is the most useful to you, with no added costs because
the metadata and the enumeration can be done in one go.

Cheers

While there is an overhead for getting the value of an xattr, it also potentially prevents the expense of doing glob lookups and magic guessing. It was added partially to help avoid those very operations, as they were deemed expensive.

I'm hesitant to commit to writing a full gvfs backend. Correct me if I'm wrong, but from reading through some of the backends in the gvfs repo, it seems like writing one would require essentially duplicating all of that effort. And then duplicating it again for KDE. All while maintaining support for the FUSE system so that it is usable on Windows/OS X.

This also isn't an isolated situation. There are tons of situations where excess reads should be avoided (e.g. slow disk with lots of files in a directory), or where a FUSE or application has explicitly set the MIME (e.g. curl setting user.mime_type based on the Content-Type header).

I may try to whip up a minimal proof-of-concept patch sometime in the next week (unless there's strong opposition to it). From there, it should be feasible to see how checking user.mime_type affects performance.

Cheers


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]