Hi Open Data folks,
I'm wondering if anyone would be willing to put together a short guide
on how to create CSV datasets for consumption by data publishing agencies.
I know many of us take the CSV format totally for granted, but I'm
increasingly finding CSV datasets that don't understand the
field/record/header/table relationship. I'll lightly pick on Victoria
here, but only because they're doing an awesome job in actually
publishing data. Take the Councillor expense datasets they recently
published, ex
http://www.victoria.ca/assets/City~Hall/Open~Data/C%20Thorton%20Joe%202013.csv
While well-meaning, they clearly don't understand the formatting
expected of a CSV file. Each Councillor is in a separate datafile, none
of which are machine readable due to offset multi-line headers. We're
seeing this type of thing all over the place in different data catalogs
and from a ton of different issuing agencies -- so its not just
Victoria. I think its a lack of data format education, and that perhaps
as tech-focused group we've taken the formats for granted.
In these situations, I'd like a guide I could send out that explains how
to format CSV data.
Explaining what a "field", "row", "table", "header", etc are in plain
language, how to quote fields, standards for dates, money values and
explaining how to handle "null" values. Data normalization and nth
normal form would also be a good addition for referenced data sets.
I'd write the guide myself, but I think I'm too close to the programming
for the language to translate well. Is anyone else equipped to translate
these concepts into everyday language?
--
Kevin