Re: Suggestion for file type detection approach



Em Dom, 2004-01-04 �14:47, Rodney Dawes escreveu:
> On Pre , 2004-01-02 at 17:46, Edward Jay Kreps wrote:
> > >Given the performance bottleneck imposed by sniffing, I suggest that it
> > >is not used anymore in directory listing routines. It should be used
> > >when the user tries to open an unknown file. Let's imagine this case:
> > 
> > I don't think this is a good way of thinking about things.  The questions are: 
> > 1. Is sniffing a good idea?  
> > 2. If so does it work correctly?
> > 3. If (1) and (2), is it performing fast enough?
> > 
> > I think others have argued persuasively that sniffing is a good idea
> > since unix doesn't generally give file name suffixes to files (even
> > though gnome does), and often files have incorrect suffixes.
> > 
> > A number of people have said that there are instances where sniffing is
> > not correctly determining the file type.  If true, this is an excellent
> > argument for fixing those cases, but not at all an argument for throwing
> > away sniffing altogether.
> 
> I've only seen one person say that. And as I understand, no real
> information was provided. Only a comment of "it broke on this one file."
> But yes, it is a good argument for improving the system, not removing
> the core functionality.
> 
> > >From your benchmarking it is clear that (3) is a problem, and sniffing
> > is taking too long.  This is not an argument for getting rid of it
> > though, just an argument for speeding it up.  Someone suggested running
> > it through a profiler, but I doubt that will be worthwhile--the problem
> > is almost certainly the multiple disk accesses (your disk is having to
> > seek for each file).  As others have pointed out, a two pass technique.
> > based on extension and then sniffing is also a bad idea since icons, etc
> > would change in the case of a discrepancy.
> 
> It is not clear that 3 is a problem. It is only probable that 3 may be
> an issue on a certain machine with a certain configuration. I guarantee
> that the sniffing is not the bottleneck. Running it through a profiler
> will be much more worthwhile than sending mail to a list saying "it's
> slow." And yes, I am the one that suggested profiling. The disk i/o is
> most certainly not the bottleneck. Given the speed of hard disks today,
> seek time is not an issue. Even if it took the 12ms maximum seek time on
> a newer model hard disk, and you were loading a directory with 1000
> files in it, that is only 12 seconds. Given the size of cache on newer
> hard disks, and the fact that people generally open up the same few
> folders, rather than opening / and traversing the tree looking for
> things, or opening odd folders randomly, I would guess that the maximum
> seek time would be around 3-4 milliseconds. It is much more likely some
> other problem that is specific to something Nautilus is doing to display
> the list of files. I also suggested other things than profiling, such
> as writing specific benchmarks to compare test-mime from gnome-vfs and
> file, which would be much more useful than saying "Nautilus must be slow
> because echo * is instantaneous" or other such nonsense. Real benchmarks
> are much more reliable than "it seemed slow" or "I used a stopwatch", as
> human error and perception can misinterpret how long it actually took to
> do something.

"ONLY" 12 seconds? Is that acceptable for a directory with 1000 files?

Can a human error of perception misinterpret the distance between <1s
and 20s multiple times with the help of a stopwatch? You must be
kidding. :-P

> Aye. In general, the speed issues have nothing to do with the way the
> mime type is detected. Any claims that one is substantially faster than
> the other, is generally due to misperception. The real issues seem to
> generally be at a lower level, or totally unrelated, such as the issues
> with some of the thumbnailers.

Test it by yourself:
http://mail.gnome.org/archives/gnome-devel-list/2004-January/msg00010.html

If your results confirm my "misperception", I'll blindly agree with
everything that you say. :-)

I'm not advocating the removal of functionality. People have already
pointed how to fix the performance and misidentification issues in
efficient manners. What I am trying to do is show the non-believers
where the performance problem actually is.

Best regards,

-- 
Fabio Gomes de Souza <fabio gs2 com br> (+55 81 9127-0597)

.- GS2 TECNOLOGIA DA INFORMACAO LTDA :: www.gs2.com.br
|- IT Infrastructure :: Security :: Embedded systems :: Linux
`- Olinda, Brazil - +55 81 3492-7777 - negocios gs2 com br





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]