Re: strings in gnumeric / awk / etc.
- From: Leonard Mada <discoleo gmx net>
- To: gnumeric-list gnome org
- Subject: Re: strings in gnumeric / awk / etc.
- Date: Tue, 16 Jan 2007 20:20:56 +0200
Hello,
Prof J C Nash wrote:
Some of the issues being raised suggest that a spreadsheet is not the
right analytic tool. How about a data frame in R?
Well, this is difficult, too. When there is a bunch of diagnoses (or
symptoms) lumped together - in one single column, that won't be easy to
work in R either.
A much more difficult subject is when a patient stays for longer than
one day (that is usually the case) and I need a specific string (say
diagnoses, symptom, ...), which may happen on any of the days, BUT I
need either the first occurrence, or the number of days with this
diagnoses, or some more complex search. I do work extensively with R
(that is why I posted this OOo issue,
http://qa.openoffice.org/issues/show_bug.cgi?id=66589), but this is NO
substitute to a spreadsheet.
Actually, spreadsheets are still the most used application in
life-sciences. I find even Epi-Info NOT as good (though it has better
analysis possibilities than a spreadsheet, BUT - of course - it cannot
compete with R). Almost every doctor will use Excel and it is the de
facto standard when doctors perform some research (I refuse to use it,
while some epidemiologists use Epi-Info, but I believe these are mere
exceptions).
I posted another use for the gawk, see the OOo issue
http://qa.openoffice.org/issues/show_bug.cgi?id=66816, where I wanted to
create some dummy variables for the medical department:
GAWK SCRIPT
#($1 contains the input - the hospital unit)
$2 = 0 # neurosurgery vs non-neurosurgery
$3 = 0 # neurology vs non-neurology
$4 = 0 # general surgery vs non-surgery
$5 = 0 # internal medicine vs non-im
$6 = 1 # ERROR var, if unknown abreviation
$0 = tolower($0)
# NEUROSURGERY
/nch/ {$2 = 1, $6 = 0 }
# Neurology
/^n[ \t]*$|^ne/ {$3 = 1, $6 =0 }
# General Surgery
/^ch/ {$4 = 1, $6 =0 }
# INTERNAL MEDICINE
/mi|end|nut/ {$5 = 1, $6 =0 }
print $0 >> 'out-file'
### END SCRIPT
Try to do this with spreadsheet functions, and it will turn out into a
nightmare.
gawk has many advantages and I may point another two:
- it is easy and simple, and very very fast (both to write and execute -
even on huge datasets)
- the code is structured and visible, so it is easy to understand what
it does (this is NOT always the case when you write complex formulas in
the spreadsheet)
I hope these are enough reasons to implement a simple menu-entry in
gnumeric that runs awk/gawk scripts.
Specifically:
- the user selects some cells
- chooses Menu-Entry: RUN gawk-script (a dialog box opens allowing the
user to select the proper script)
- gnumeric should then open a bidirectional pipeline to gawk
- should add some default values for the FieldSeparator (FS) and
RecordSeparator (RS), that should be also used to split (join) the Cells
and Rows in the worksheet when pipelining the data stream into gawk
- gawk's output should be split back into cells (using the same FS and
RS) (probably into a new sheet, like ANOVA)
I believe this is easy to code and quite useful.
Many thanks in advance,
Leonard Mada
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]