Re: More than you ever wanted to know about csv files (Re: to csv or not to csv)



By way of analogy, let me mention that in physics, there are five or
six inequivalent ways of defining "heat", each of which has some merit.
That that's just counting the technical meanings, not including innumerable
nontechnical and metaphorical meanings.  Also two inequivalent meanings of
"gravity".

Similarly, in mathematics, there are two inequivalent meanings of "primitive".
Also "field".  Also "linear".

In general, having multiple concepts masquerading behind a single name is
a source of endless confusion.

Our current discussion falls into this category.  At the very least we have
 -- CSV: values separated by actual commas
 -- SSV: values separated by semicolons
 -- ASV: values separated in some arbitrary way,
        to be decoded by some adaptive and/or configurable algorithm.

At this stage it is not important to argue about which of these is better
or worse.  Each has some some merit.  My point is simply this:
  THESE ARE NOT ALL THE SAME.

I don't even care what names you give them, so long as the names are not
the same.

=================

Tomorrow we can ask whether we agree with the following thesis:

  "Improving support for ASV should not degrade support for CSV,
   or vice versa."

Whether you agree with that or not, my point for today is that we cannot
even formulate the question unless we have reasonable names for what we
are talking about.



On 10/13/20 5:42 PM, Morten Welinder wrote:
Mostly for entertainment purposes, here's a tour of Northern Europe as
told by random
csv files.  (I am ignoring xls files named *.csv -- there are quite a
few of those too.
Also ignored are completely vanilla "," separated files which are also
used in these
countries.)

TL;DR: ";" separation is quite common.  Decimal comma is common.

If you wonder why ";" is so common: it's what Excel does in locales
that use decimal
comma.   Gnumeric cannot ignore this fact.

Let's start in Germany. Here's a list of German doctors.  Note, that
the separator is ";":
http://www.stadtmagazinverlag.de/orte/senftenberg09/Aerzte.csv

Here's a csv file using [tab] as separators:
https://gitlab.lrz.de/ru49qap/paradiso/blob/master/kiva_locations.csv

Here's "|" as separator.  That's a new one! Note also the "123.45 €" amounts.
https://www.smarthome.de/feed/exagon-smarthome.csv

Here's a ';' separated file with "123,56" numbers.
https://wahlergebnis.duisburg.de/Buergerentscheid/05112000/html5/Buergerentscheid_NRW6.csv

Moving on to Finland, here we see ";" separated data in some non-UTF8
encoding.  It looks
like a bunch of names and addresses.  Or maybe it's the local
butcher's price list -- I can't tell.
https://www.graafinenteollisuus.fi/files/149/7_saraketta_09.csv

Denmark.  The Education and Research Ministry uses  ";" separated data
with "123,45" numbers:
https://ufm.dk/uddannelse/statistik-og-analyser/uddannelseszoom/ufm_samlet_02sep2020.csv

Sweden, [tab] separated:
https://panglaodb.se/csvs/f658ebfb.csv

Norway: ";" separated with what appears to be an html header:
https://www.feva.no/wp-content/uploads/2013/08/resultater1.csv


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]