Re: Histogram "Bins" tab options problem

From: "Daniel P. Dougherty" <doughe57 msu edu>
To: "Andreas J. Guelzow" <aguelzow pyrshep ca>
Cc: gnumeric-list gnome org
Subject: Re: Histogram "Bins" tab options problem
Date: Tue, 8 Feb 2011 14:51:33 -0500

The other issue is that the bin labels created by the Histogram process appear 
as english words

Histogram               
                0.22251744455154
above 0.02077103504058  up to 0.12736215860097  8.3%
above 0.12736215860097  up to 0.23395328216135  16.7%
above 0.23395328216135  up to 0.34054440572173  8.3%
above 0.34054440572173  up to 0.44713552928211  0.0%
above 0.44713552928211  up to 0.55372665284249  0.0%
above 0.55372665284249  up to 0.66031777640287  16.7%
above 0.66031777640287  up to 0.76690889996325  16.7%
above 0.76690889996325  up to 0.87350002352363  16.7%
above 0.87350002352363  up to 0.98009114708401  8.3%

All of the "above..." in one column and "up to ..." in the next column over 
makes it difficult to make decent X-axis labels for "Column chart"  (see 
attached image).  You also can't apply the number formatting tools to this 
text (e.g. "Scientific notation" or certain number of decimals etc) This really 
needs to be fixed some how.  Possible solution might be to label the bin 
center?? Possibly at a third column with the bin center so the user (at their 
option) can use the bin center as a label?? 

My sense is that the current labeling scheme for histogram will seem at odds 
with what a typical end-user would/will expect/want.







On Tuesday, February 08, 2011 01:44:55 pm Andreas J. Guelzow wrote:

On Tue, 2011-02-08 at 13:32 -0500, Daniel P. Dougherty wrote:

Under the bins tab on the histogram dialog there is a possible error in
the labels.   The issue is how to capture the finite minimum and maximum
data values in a bin??  I think it is more correct that the 8 rules
should each be stated as:

(-inf,*),[*,*),...,[*,*),[*,+inf)
(-inf,*],(*,*],...,(*,*],(*,+inf)

[*,*),[*,*),...,[*,*),[*,+inf)
[*,*],(*,*],...,(*,*],(*,+inf]
(-inf,*),[*,*),...,[*,*),[*,*]
(-inf,*],(*,*],...,(*,*],(*,*]

[*,*),[*,*),...,[*,*),[*,*]
[*,*],(*,*],...,(*,*],(*,*]


Comparing the above with the current git version I see that you are
suggesting (with respect to the first interval):
[*,*],(*,*],...,(*,*],(*,+inf]
[*,*],(*,*],...,(*,*],(*,*]
instead of
(*,*],(*,*],...,(*,*],(*,+inf)
(*,*],(*,*],...,(*,*],(*,*]
and (with respect to the last interval):
(-inf,*),[*,*),...,[*,*),[*,*]
[*,*),[*,*),...,[*,*),[*,*]
instead of
(-inf,*),[*,*),...,[*,*),[*,*)
[*,*),[*,*),...,[*,*),[*,*)

Your suggestion would make the first or last interval significantly
different from the others.

Of course the probability that an observation happens to hit the border
is basically 0 (unless you are really having a discrete distribution and
should be using frequency tables instead of histograms.)

I don't think this maximum/minimum cutoff should really ever be equal to
the value of an observation.

Andreas


-- 
-----
Daniel P. Dougherty
W27 Holmes Hall
Michigan State University
East Lansing, MI 48827
Email: doughe57 msu edu
WWW: http://www.msu.edu/~doughe57

Attachment: hist_label_problem.png
Description: PNG image

Follow-Ups:
- Re: Histogram "Bins" tab options problem
  - From: Andreas J. Guelzow

References:
- Histogram "Bins" tab options problem
  - From: Daniel P. Dougherty
- Re: Histogram "Bins" tab options problem
  - From: Andreas J. Guelzow

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]