Re: Mime type detection in beagle [Was: IndexHelper eating cpu]

Hi Debajyoti,

Debajyoti Bera wrote:
shared-mime-info.xml (FreeDesktop.Org spec) has the following to say "... There are several reasons for checking most of the glob patterns before the magic. Some applications don't check the magic at all, and this makes it more likely that both will get the same type. Users can easily understand why calling their text file <filename>README.mp3</filename> makes the system think it's an MP3, whereas they have trouble understanding why their computer thinks <filename>README.txt</filename> is a PostScript file. If the system guesses wrongly, the user can often rename the file to fix the problem...."
Please don't expect that I agree to this: "glob patterns" before "magic" ?
Obviously they check only "glob pattern". Only if the glob pattern doesn't
exist they check for the "magic", probably.

However, as a user I do expect that the mime type is just detected correctly.
Independent of file names and extensions or even missing extensions. The
latter is not unlikely on Unix/Linux systems.

During the last weeks we both have discussed exceptions in jpeg files which
finally turned out to be just png files, whatever. IIRC, in the last weeks you
added checks to image filters to make sure that you really parse a jpeg, gif,
png, etc. - a task which doesn't belong to the beagle filters at all but
obviously triggered by poor mime type detection.

An external pre-filter for mime type detection could improve this situation,
I think. As already pointed out, "file" and "identify" are much more reliable
in these cases.

All the major desktop environments are using/are moving to shared-mime-info and xdgmime. It would be pretty inconsistent if beagle indexes a file as jpeg but the user sees a pdf icon for that file in nautilus.
If the file is a jpeg it should be indexed as a jpeg regardless of the icon on
the desktop. And when xgdmime depends on the file name extension in first place
it won't be fixed at all ..., right ? Except they would start cross-checking
file extension _and_ magic, not just _only_.

Kind regards,

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]