Re: More than you ever wanted to know about csv files (Re: to csv or not to csv)

From: John Denker <jsd av8n com>
To: Gnumeric Mailing List <gnumeric-list gnome org>
Subject: Re: More than you ever wanted to know about csv files (Re: to csv or not to csv)
Date: Sun, 11 Oct 2020 20:25:11 -0700

On 10/11/20 1:02 PM, Morten Welinder wrote:

The use of quotes in csv files is *not* an indication that a piece of data is a
string.  Several reasons: .......


That's fine.  I haven't heard anybody suggest otherwise during the current
discussion.

The Gnumeric solution here is to interpret the text as-if it was
entered in a cell.


Also fine.

This has both good and bad aspects:

1. A single initial quote can be used to interpret the rest as text.
   Occasionally useful; rarely a problem.


Also fine.  Definitely useful.

2. An initial "=" can be used to force interpretation as a formula.
   Occasionally useful; rarely a problem.


Also fine.  Definitely useful.

If this is not the behaviour you want, then you must arrange for that particular
piece of data to be interpreted as a string, not a number or anything
else.  This
is one of the major pitfalls of csv files and the place falls squarely
on the csv
format because it has no way of saying "that's a string!"


I'm confused.  I continue to think in terms of three layers:
-- The character-encoding layer.
-- The CSV layer, which deals with raw strings.
-- The semantic layer, which interprets things.

I'm not saying that everybody is obliged to think in terms of this model, but
I find it useful, and unless/until somebody suggests something better I will
continue to use it.

So, item 1. above means there *is* a way (at the semantic layer) of saying
"gnumeric should treat this as a string".  I don't see this as a pitfall
at the CSV layer.  It's not part of the CSV layer at all, and it's also not
a pitfall at the higher layer.

We cannot have uninterpreted data in a
spreadsheet, so the semantic layer is not optional.


I don't recall anybody suggesting that the semantic layer was optional in
this context.  Obviously gnumeric needs a semantic layer.  I'm just saying
that I find it useful to think of the various layers separately.  Is there
ever a need to violate layer-separation?  I've never encountered one.

  There are other (non-gnumeric) contexts where the semantic layer would be
  wildly different, or absent entirely, but that's irrelevant to this thread.

   Foo,Bar"Baz,Bof       # Quote inside unquoted field


According to my opinion, and according to the RFC, this is invalid syntax at
the CSV level.  If somebody wants to do this, that sounds like a good candidate
for the configurable text importer.  To say the same thing another way, if
supporting this within the .csv converter is causing problems, I would recommend
de-supporting it.  I cannot imagine this is very common.  If I emitted something
like this, I would have only myself to blame.

   1,22;;222,11          # (And, yes, still called csv even though the separator
                         # isn't comma!)


Ditto.

There are so many ***conflicting*** csv implementations in the world


I don't think anybody is suggesting that support should be provided within the
gnumeric .csv importer for every crackpot format that has ever been called "csv".

The suggestion to use the configurable text importer is entirely reasonable for
users who want to do kooky stuff that conflicts with prosaic csv ... but I do
not see why it should be the reflex response to users who are doing non-kooky
stuff.

References:
- to csv or not to csv
  - From: John Denker
- More than you ever wanted to know about csv files (Re: to csv or not to csv)
  - From: Morten Welinder

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]