Re: Suggestion for file type detection approach

From: Jeffrey Stedfast <fejj ximian com>
To: Fabio Gomes <bugtraq gs2 com br>
Cc: ejk ucsc edu, gnome-devel-list gnome org
Subject: Re: Suggestion for file type detection approach
Date: Sat, 03 Jan 2004 10:28:45 -0500

has anyone actually done any profiling? or even testing the sniffer
itself to see how fast it could detect all this? I seriously doubt it is
as slow as people are making it out to be.

Jeff

On Sat, 2004-01-03 at 10:21, Fabio Gomes wrote:
> [ I am replying only to gnome-devel-list to reduce traffic ]
> 
> Em Sex, 2004-01-02 às 19:46, Edward Jay Kreps escreveu:
> 
> 
> > Sniffing is slow because it opens every file and reads some of it every
> > time you open a given directory. If you want to make this fast, cache
> > filetypes; now opening the huge mp3 folder is just a matter of reading a
> > single cache file and sniffing those files with a modification time
> > later than that of the cache file.  Naturally this would only need to be
> > done for those really huge directories, it would probably be a waste for
> > directories with only a hundred files or fewer.
> 
> This would be great. The cache could reside inside the directory
> metadata, since this would allow users to modify it to fix misidentified
> files. And this directory metadata API already exists. :)
> 
> > I don't think we need to worry about which approach is ultimately going
> > to perform faster. For a program like Nautilus either there is or is not
> > a human-noticeable lag time; improving performance when there is no lag
> > time is totally pointless.
> 
> Don't forget about the waste of traffic and server disk I/O that can be
> generated if content sniffing is used across the network by lots of
> users. :)
> 
> > 
> > A number of people have used Windows as an example of why we don't need
> > to sniff files; though there are a number of features in Windows worth
> > copying this is definitely not one of them.  I have worked on some
> > commercial software and the number one frivolous bug report or unfixable
> > user issue occurs when the user attempts to open a file that has an
> > incorrect filename extension.  
> 
> Hmm. What would you do if your customers had the same problem, but with
> content sniffing misidentification instead of wrong suffix? :)
> 
> > People on this list have suggested that
> > this is a user problem not a software problem (i.e. that the user was
> > stupid and beyond help), but I can assure you that however obvious the
> > connection between the hidden Windows filename extension and the error
> > message that our program gave is to me and you, it was not obvious to a
> > large number of otherwise very intelligent people who just weren't as
> > knowledgeable about computers.
> 
> It is possible to educate users about the fact that files must have
> suffixes to be properly identified by the system and that it is possible
> to fix wrong suffixes by right-clicking and running the filetype
> detection tool to rename the file properly. The system can also
> intelligently wanr the users when they are about to save or rename a
> file with wrong suffixes.
> 
> Besides performance, big part of the discussion is about manageability.
> Content sniffing is not manageable by users or sysadmins. It can only be
> managed by programmers. IMHO, we should not impose to users (in the way
> it is currently implemented) a feature that, in some cases, get in their
> way and they cannot even call the technical support to walk around it.
> 
> Imagine that you have a company with 100 machines running the GNOME
> Desktop and content-sniffing misidentifies some file type that is
> crucial to the work of this company. Some people suggested that this is
> merely a bug and should be reported accordingly. OK, let's think like
> this, so the 100 users must change the way they work (ie, stop opening
> files from Nautilus) until 1) the bug is reported, 2) the bug is fixed,
> 3) a new stable version of gnome-vfs is released, and 4) the system
> administrator (or the outsorcing company) upgrades every station.
> 
> In the example above, will users return to Nautilus after everything is
> fixed?
> 
> I imagine that content sniffing is really useful for home users, who
> download multimedia stuff from P2P. These files really come with wrong
> suffixes all the time. But the most common ones are Video or Audio files
> that end up being open with the same program just like if they had the
> right suffixes. 
> 
> But is content sniffing really useful at work? Corporate Linux desktop
> networks often consist of NFS mounts with lots of folders and
> internally-created documents. When a user cannot open a file on the
> network, she will probably ask the technical support. If the problem is
> with a file that she have received by mail, she will probably ask the
> sender.
> 
> The current implementation of content sniffing not 100% accurate. This
> makes me remember voice recognition, language translators and Optical
> Character Recognition (OCR). These technologies are great but computers
> still have keyboards and mice. It is not possible to enforce input
> technologies that are not 100% acurate/reliable. They cannot be used as
> a mandatory input source in applications. Instead, they are used as
> tools to reduce costs and ease user's lifes. The same should apply to
> content sniffing: It is not 100% accurate but it is currently being
> enforced for file type detection in Nautilus.
> 
> Happy new year!

Follow-Ups:
- Content sniffing benchmarking script (Was: Re: Suggestion for file type detection approach)
  - From: Fabio Gomes
- Re: Suggestion for file type detection approach
  - From: Soeren Sandmann
- Re: Suggestion for file type detection approach
  - From: Dave Benson

References:
- Re: Suggestion for file type detection approach
  - From: Edward Jay Kreps
- Re: Suggestion for file type detection approach
  - From: Fabio Gomes

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]