Re: Using "user.mime_type" xattr for MIME guessing



On Thu, 2018-08-09 at 13:09 -0400, Colin Atkinson wrote:
On Thu, Aug 9, 2018 at 10:47 AM Bastien Nocera <hadess hadess net>
wrote:
On Thu, 2018-08-09 at 10:35 -0400, Colin Atkinson via gtk-devel-
list
wrote:
Hi everyone,
I'm working on a FUSE file system which makes network requests
whenever a file is read. So obviously, I would like to avoid
excess
read requests to files.

The current implementation [0] of gio's MIME guessing seems to
check
the file extension, and then immediately fall back on reading
magic
bytes when this is not possible (i.e. files with an ambiguous or
no
extension). In my situation, this can potentially lead to many
network requests any time a user opens a directory in Nautilus or
a
file selection dialog.

According to the FreeDesktop specs [1], implementations may query
the
user.mime_type xattr for a given file's MIME. But the current
version
of glib seems not to make use of this.

Would there be any interest in a patch to add this functionality?
If
so, I would be happy to work on it.

Please let me know if there's anything I've missed/misunderstood.

It's probably something interesting to add to GIO, though checking
xattrs also has a cost, especially on local disks.

Depending on what your FUSE is, you might want to consider writing
a
gvfs backend instead, where the backend is responsible for
providing
the mime-type/content-type (and all the other metadata), so you can
use
whichever method is the most useful to you, with no added costs
because
the metadata and the enumeration can be done in one go.

Cheers

While there is an overhead for getting the value of an xattr, it also
potentially prevents the expense of doing glob lookups

A glob look is essentially free. There's an mmapped cache with those,
and it takes microseconds to do a lookup.

 and magic guessing. It was added partially to help avoid those very
operations, as they were deemed expensive.

magic guessing is only going to be expensive because it needs to do
I/O. So would looking up the xattr.

I'm hesitant to commit to writing a full gvfs backend. Correct me if
I'm wrong, but from reading through some of the backends in the gvfs
repo, it seems like writing one would require essentially duplicating
all of that effort.

Well, yes and no. We've not seen your code, we just know that it's for
a network filesystem. A gvfs backend will integrate better with GNOME
in general, but most of the code should be trivial if you're using a
library to hide all the intricacies of the underlying protocol.

As for the benefits of writing a gvfs backend, there's an "afc" (Apple
File Conduit, for iOS devices) FUSE backend, but we also wrote a gvfs
backend. The gvfs backend integrates with a separate backend that
watches for plugged in devices, and uses that to mount the filesystems.
There's integration with GNOME because it knows how to tell you about
unlocking your device, it can set thumbnails or icons, and mime-types
on files without doing extra I/O.

Using a file manager that speaks gvfs on top of a gvfs backend is just
going to be more efficient. A FUSE backend is nice when you're
prototyping, and want people to test out the code, and kick the tires.
It's just not a long-term solution for a lot of use cases. (Though it's
plenty fine if the filesystem is a local one, and the format matches
POSIX expectations, such as local filesystems that are unsupported by
the kernel, you'd then teach udisks how to mount them and integration
would be good enough).

 And then duplicating it again for KDE. All while maintaining support
for the FUSE system so that it is usable on Windows/OS X.

Depends what your target is. I doubt that Windows will read xattrs, but
then again, I don't know anything about FUSE under Windows.

This also isn't an isolated situation. There are tons of situations
where excess reads should be avoided (e.g. slow disk with lots of
files in a directory),

Again, xattr reads are not free. And there might not even be a
user.mime_type xattr in there!

 or where a FUSE or application has explicitly set the MIME (e.g.
curl setting user.mime_type based on the Content-Type header).

I may try to whip up a minimal proof-of-concept patch sometime in the
next week (unless there's strong opposition to it). From there, it
should be feasible to see how checking user.mime_type affects
performance.

Alex Larsson looked into that, and reading xattrs when most of them
didn't have the user.mime_type xattr will just waste I/O. It might need
to be a special case for specific FUSE filesystems to avoid every
directory read being slower.

Cheers


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]