Re: Why file content sniffing sucks



> - The information generated by it is proven to not be accurate enough to
>   be used by a program do determine its actions (the above example and
>   the dozens of related bugs in bugzilla are sufficient). This data
>   should be used merely for informational purposes to the user in the
>   Properties dialog, for example.

The point I have been trying to make for a while is that systems should
have ONE magic database.  This database should be shared by file, GNOME,
KDE, etc.  The problem of maintaining a database that correctly identifies
types, given the diverse range of files out there is big.  One database
should be maintained by all so that it is as accurate as possible.

Note that there are GNOME bugzilla bugs that (as a sub-point) ask,
"I've gotten GNOME to identify file X as type Y, now how do I do the
same for the file command?"

> - It reduces performance of Nautilus to a dead turtle:

This is probably due to bad design instead of a result of the idea itself.
The file command, for example, compiles its magic database into a more
quickly read format.  This could possibly speed up things.  Some kind
of intelligent cache system may as well.

For what its worth, here is what the author of the file command (the
version usually found in Linux distributions at least) said about
modifying file to use the freedesktop.org magic/mime database
specification:

	Yes, I've seen it. I don't know if parsing xml is really worth it.
	I'd rather have something compile xml into a format that can be
	parsed quickly. file, even goes to compile the magic entries so
	that it does not have to do any work reading them. The other
	issue I have with it is the priority code. I think that there
	should be something in file computing the strength of each
	magic number depending on the length and a frequency map,
	and auto-sorting magic entries. I am planning this for the
	next version of file. I don't like depending on the extensions
	of files. What I do like is the ability to utilize the file
	database to produce different kinds of output (mime, text, etc)
	which the xml stuff gives you and file kludges horribly. So yes,
	I am not happy about the format of file entries, but I am also
	not happy about the way the xml stuff was done. I mentioned this
	years ago to the shared-mime-info folks, but I don't think they
	understood what I was saying with respect to generalizing file
	to handle things such as tiff or jpg files properly and sorting
	magic according to strength.

-- 
Mike

:wq



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]