Re: [guppi-list] Re: White-paper on ASCII import



> A nice Unixism that might be worth preserving would be to have
> lines preceeded by # also be ignored as comments.

Yes, that makes sense.

> This part is crucial.  The import engine should be pretty good at
> "doing the right thing" when confronted by a not-too-complicated
> situation.  This is key to having a nice "automatic" data import into
> Guppi.  

I agree.  I have collected a bunch of test-ASCII files that I plan to test it
on, including stuff exported from Excel.  However, I need more because I
haven't seen all ways of formatting ASCII files yet.
So if you have some, please mail them to me, so I can use them in a test-suite:
 alstrup@diku.dk.

> One of the things I've always liked about Excel is that the
> Import thingie is pretty good in figuring out the structure of the
> data you want to import.  Not great, but pretty good.  I want Goose
> (and hence Guppi) to, in the end, leave that damn little "Wizard" in
> the dust.

It will.

> On a related note, maybe it would make sense to allow for some sort
> of structured comments containing "import hints".  So if I was writing
> some little Perl script that spits out data, I could have it print
> something like this at the top:
> #*# Col=1 Type=Date(YYYY-MM-DD) Sep=\t Name="Date"
> #*# Col=2 Type=Percent Name="Bar"

John M. Wharington suggested something along the same lines:  To enable
comments along with the data, i.e. to stuff more information into the ASCII
files.  His suggestion is indirectly covered in the sense that we will be able
to import an ASCII file where any column is a string.  Then the comments should
be put in that column.  Of course Guppi (and Goose) does not yet support a data
source of strings, but it's almost trivial to add, i.e. vector<string>.  

(Tangent:  And this is a primary reason for my secret plan to make the DataSet
interface to look like a vector as much as possible (with iterators and all):
Then we can exploit the advantages of generic programming.  Besides enabling a
common interface for the import engine, we also allow the use of all the nice
routines in #include <algorithm>.  Also, the interface will seem natural to any
programmer that knows the STL.)

Anyway, returning to adding more stuff to the ASCII format:  I don't think we
want this.  It's overlapping too much with the aim of a real file-format. 
Whatever is build will be an ugly hack.  When the first meta-comments have been
added, somebody will add even more, and after some time, we'll end up with a
totally hacked and bugged format.

I agree with Havoc that doing an XML-based format is a better idea, simply
because that per design is extendible, and thus prepared for the future.
Yes, it will make it more complicated to hack up a script to produces some
data, but the benefits are worth it, IMO:  You are guaranteed that no
information is lost, since the XML-import engine will know exactly how to parse
the information, and will not have to "automatically guess" twenty different
parameters.
This guarantee is difficult to provide when you use something as loose as
annotated ASCII.  In other words:  I won't implement meta-comments, but anybody
is of course welcome to do it if they feel it's worth it.

If you have any ASCII files lying around, *please* send a copy to me so I have
something to test this stuff with.  And the e-mail was alstrup@diku.dk.

Greets,

Asger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]