Re: [GNOME VFS] Re: using deprecated gnome_mime instead of gnome_vfs_mime (was filemoniker patch)
- From: Pavel Cisler <pavel eazel com>
- To: Michael Meeks <michael helixcode com>
- Cc: Darin Adler <darin eazel com>, Miguel de Icaza <miguel helixcode com>, gnome-components-list gnome org, gnome-vfs helixcode com
- Subject: Re: [GNOME VFS] Re: using deprecated gnome_mime instead of gnome_vfs_mime (was filemoniker patch)
- Date: Thu, 30 Nov 2000 15:14:25 -0800
Michael Meets wrote:
>
> Hi,
>
> On Thu, 30 Nov 2000, Darin Adler wrote:
> > Why do you call the MIME database and MIME sniffing a trivial hack?
> > It's a piece of code that's hard to get right, and we've been working
> > on making it better for months.
>
> This reminds me ( on a totaly different track ); I was recently
> reading through some parts of the gnome-vfs api and I noticed these APIs:
>
> const char *gnome_vfs_get_mime_type_for_buffer
> (GnomeVFSMimeSniffBuffer *buffer);
>
> I assume this function uses the mime magic information to try and
> determine the type, and returns NULL if it can't characterize it
> precicely. Then I saw:
>
> gboolean gnome_vfs_sniff_buffer_looks_like_text
> (GnomeVFSMimeSniffBuffer *buffer);
>
> And I thought fair enough, perhaps text is a very special case for
> sniffing; then I saw:
>
> gboolean gnome_vfs_sniff_buffer_looks_like_mp3
> (GnomeVFSMimeSniffBuffer *buffer);
>
> And I was slightly worried. Are we going to gather a great series
> of looks_like_my_grandma type calls ? or should the (fuzzy) sniffing API
> be something like:
>
> gboolean gnome_vfs_sniff_buffer_looks_like (buffer, char *mime_type);
>
> So; then I looked at the implementation to try and work out what's
> going on and I see in the get_mime_type_for_buffer function:
>
> /* if no match, try the algorithmic sniffers */
> if (gnome_vfs_sniff_buffer_looks_like_mp3 (buffer)) {
> return "audio/x-mp3";
> }
>
> But no sniff for text.
>
> And it looks ( from the comment ) as if a more general framework
> was intended, with perhaps pluggable mime type sniffers; is this the
> case? it would seem more reasonable ( to me ), to register a list of
> sniffer functions that could be handled genericly, perhaps with a degree
> of certainty ranking depending on how smelly the stream is to them ?
Michael,
There is a limited set of common file formats in addition to text that
cannot be covered by magic patterns. They include mp3, true type,
possibly targa images. You could argue that they are poorly designed
file formats but they happen to be mainstream enough that we need to do
a good job at recognizing them.
You are right that we need a better framework for these algorithmic
sniffers and I was planning to add it at some point. Given the fact that
there is only a handful of formats like this and that adding algorithmic
sniffers is hard (you have to write code instead of adding a pattern to
a database), I didn't add it for now and just hardcoded them for now.
The framework would also address the need to properly insert a given
algorithmic sniffer at the right point of the mime sniffer chain to give
it a proper priority. At the same time it helps if we encourage folks
designing new file formats to use a header that contains a good magic
string so that we can keep the number of algorithmic sniffers to a
minimum.
One other feature that I'd like to add eventually that is related to
recognizing text files is to have specific files use a combination of
mime magic and suffixes for a more accurate type detection. Currently we
treat mime magic as preferred in all cases but text. For some mime types
a suffix may still be better than mime magic - .pdf.gz for example -
mime magic will identify it as a gzip file, we could have a mechanism
that for gzip files looks at the suffix and uses it to get a more
accurate type (in our case something like "zipped pdf"). My thinking is
that I will have a sniffer call:
const char *gnome_vfs_get_mime_type (GnomeVFSMimeSniffBuffer *buffer,
const char *optional_suffix);
(don't know what the name of the routine will be yet.) and make the
calls that detect a mime type based on only the data stream or only the
suffix deprecated.
Pavel
[
Date Prev][
Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]