Re: GLib file magics



On 27 Jul 2000, Maciej Stachowiak wrote:

> Tim Janik <timj@gtk.org> writes:
> 
> > On 27 Jul 2000, Maciej Stachowiak wrote:
> > 
> > 
> > the result is a user supplied gpointer data; if you care to look at the API ;)
> > besides, the magic matching code is required by BSE (depends on glib only),
> > gimp (doesn't depend on gnome-vfs (yet)) and gdk-pixbuf (doesn't have
> > gnome-vfs around either).
> 
> I am tempted to go into a Miguel-style flame now about foolish fear of
> dependencies being the enemy of sane code reuse, but that wasn't
> really my point.

thanks for resisting ;)

> > but it's pretty orthogonal to that anyways, you could even use it to
> > 1) validate your mime type lookups from pure file extensions
> > 2) figure mime matches for files where extension matches failed.
> 
> The gnome-vfs (formerly gnome-libs) system matches based on both magic
> and file extension (and also possibly file regexp pattern). Soon it
> will also support algorithmic detection of ascii text and mp3 files,
> since neither of these formats can be detected by magic number alone.
> 
> As I said, you should look at the existing code and see if you can
> either copy it into glib, or at least provide the same feature set so
> we can eliminate the duplication later.

ok, first, when i pondered a suitable solution for BSE, gnome-mime didn't
handle magics (and was sitting in gnome-libs, iirc).
so now, i had a look at gnome-vfs-mime-magic.c, and to not require the
rest of the gtk-devel audience to compare literall code as well, i've
compiled a rough pro/con list for both approaches:

gnome-vfs-mime-magic.c:
-       misses 'u' prefix for types
-       closely tied to file io (even so for the mime specification)
-       doesn't handle an of the >, <, x, &, ^ test checks
+       features offset ranges (non magic(5)), with NUM:NUM
+       can dump mime table
+       handles date, ledate types

GMagic:
+       implements the important subset of mime(5) including
        comprehensive numerical tests
+       provides size type extension (required by gimp)
+       accounts for match collisions with priorities
+       provides a generic interface (no global list, return
        values are user-defined)
+	does match attempts on byte streams
-	doesn't know about mime-types

for gnome-mime, i have to say that being unable to do >/< checks
on the numerical values is a very big minus, you're unable to
e.g. cover version ranges of file formats that way.

as for the + in handling dates, take a look at
/usr/share/misc/magic which only uses dates to improve
verbosity of the output, or ./data/mime/gnome-vfs-mime-magic
that doesn't use date/ledate for file checks either, so
the effective benefit of that radically approaches zero.
in any case, it'd be a 10 second thing to alias that to
long/lelong (+belong) for the GMagic code to allow for
numeric matches there as well.

as for the offset NUM:NUM extension, that seems to enable the
code to look for a given string in a specified range of the
file, so there can be magics like:
0:64    string          \<!DOCTYPE\ HTML                        text/html
0:64    string          \<HTML                                  text/html
0:32    string          \#include                               text/x-c
0:32    string          \#ifndef                                text/x-c
0:32    string          \#ifdef                                 text/x-c
that's definitely an interesting feature (though the gnome-mime
implementation looks awfully slow, with range_end-range_start number
of reattempts of full sub-pattern matches). i may consider something
like that for GMagic as well, but then i'd probably base it on
the wildcard pattern matching algorithm that we use for widget
path matches in rcfiles, and i'd use a new type name for that,
since that's a completely non-mime(5) thing ;)

oh btw, the match masks than can be appended to the types by using '&'
(e.g. "0 leshort&0xffaa >0xaa # test every second bit in lower byte")
have to be appended _without_ extra white spaces, the extra
eat_white_space() calls in gnome-vfs-mime-magic.c should prolly be
removed (that's to maintain field order).

> The current code does assume a
> central list of magic info and filename/extension patterns, and always
> returns a mime type rather than user-defined data, but I see no
> particular reason to allow either the match set or the return value to
> be user-supplied.

all in all, i'll certainly not move that code into glib, it'd
not be suitable for gimp (doesn't feature normal mime(5) numeric
matches), pixbuf (and can't operate on byte streams) or bse (doesn't
provide the magic table in a file, gnome-mime relies on ordered
magic registration since it doesn't account for collisions).
it more looks like you'd get a huge win out of basing the
mime magic matching backend on GMagic and simply use GMagic's
data pointer for mime type specs. that is, once GMagic
features ranges, though i'll probably not add those if i don't
see backend reuse intends from the gnome-mime side.

as for why a user-supplied gpointer data; member is usefull,
well, gnome-mime can stuff it's mime-type string/struct there,
BSE can store a procedure type of a loader function there,
gimp can use that for a PDB proc identifier, and gmagictest
can use it for storing messages, i do see a use there ;)

> 
>  - Maciej
> 

---
ciaoTJ





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]