The other issue is that the bin labels created by the Histogram process appear as english words Histogram 0.22251744455154 above 0.02077103504058 up to 0.12736215860097 8.3% above 0.12736215860097 up to 0.23395328216135 16.7% above 0.23395328216135 up to 0.34054440572173 8.3% above 0.34054440572173 up to 0.44713552928211 0.0% above 0.44713552928211 up to 0.55372665284249 0.0% above 0.55372665284249 up to 0.66031777640287 16.7% above 0.66031777640287 up to 0.76690889996325 16.7% above 0.76690889996325 up to 0.87350002352363 16.7% above 0.87350002352363 up to 0.98009114708401 8.3% All of the "above..." in one column and "up to ..." in the next column over makes it difficult to make decent X-axis labels for "Column chart" (see attached image). You also can't apply the number formatting tools to this text (e.g. "Scientific notation" or certain number of decimals etc) This really needs to be fixed some how. Possible solution might be to label the bin center?? Possibly at a third column with the bin center so the user (at their option) can use the bin center as a label?? My sense is that the current labeling scheme for histogram will seem at odds with what a typical end-user would/will expect/want. On Tuesday, February 08, 2011 01:44:55 pm Andreas J. Guelzow wrote:
On Tue, 2011-02-08 at 13:32 -0500, Daniel P. Dougherty wrote:Under the bins tab on the histogram dialog there is a possible error in the labels. The issue is how to capture the finite minimum and maximum data values in a bin?? I think it is more correct that the 8 rules should each be stated as: (-inf,*),[*,*),...,[*,*),[*,+inf) (-inf,*],(*,*],...,(*,*],(*,+inf) [*,*),[*,*),...,[*,*),[*,+inf) [*,*],(*,*],...,(*,*],(*,+inf] (-inf,*),[*,*),...,[*,*),[*,*] (-inf,*],(*,*],...,(*,*],(*,*] [*,*),[*,*),...,[*,*),[*,*] [*,*],(*,*],...,(*,*],(*,*]Comparing the above with the current git version I see that you are suggesting (with respect to the first interval): [*,*],(*,*],...,(*,*],(*,+inf] [*,*],(*,*],...,(*,*],(*,*] instead of (*,*],(*,*],...,(*,*],(*,+inf) (*,*],(*,*],...,(*,*],(*,*] and (with respect to the last interval): (-inf,*),[*,*),...,[*,*),[*,*] [*,*),[*,*),...,[*,*),[*,*] instead of (-inf,*),[*,*),...,[*,*),[*,*) [*,*),[*,*),...,[*,*),[*,*) Your suggestion would make the first or last interval significantly different from the others. Of course the probability that an observation happens to hit the border is basically 0 (unless you are really having a discrete distribution and should be using frequency tables instead of histograms.) I don't think this maximum/minimum cutoff should really ever be equal to the value of an observation. Andreas
-- ----- Daniel P. Dougherty W27 Holmes Hall Michigan State University East Lansing, MI 48827 Email: doughe57 msu edu WWW: http://www.msu.edu/~doughe57
Attachment:
hist_label_problem.png
Description: PNG image