On 16.06.2016 01:42, Kenny McCormack wrote:
> In article <
881169da-4e5d-4abd...@googlegroups.com>,
> Robert Mesibov <
robert....@gmail.com> wrote:
>>> (The ASCII Tab, BTW, has a lot other disadvantages, and is also generally
>>> not a good choice. For example you can't see multi-Tab sequences, and it
>>> provokes hard to detect bugs if one line has, say, two Tabs to align data
>>> and the thus program interprets it as an spurious additional empty field.)
>>
>> I don't see other disadvantages,
You maybe don't see it but there are. Besides the mentioned multi-Tabs is the
depenency of editors in hand-created files where editors expand it to Space
or mixtures of Tab and Space, even depending on specific Tab-Width setting.
>> and I agree with Marc that tab is the most recommendable field separator.
Any unique _visible_ separator is preferable. YMMV. A typical choice is the
pipe symbol which is rarely used inside the payload data and clearly visible.
I've seen a lot advanced software products in the past three decades that
have used that choice.
Nonetheless it's always depending on the application data context. So any
religious or dogmatic statement about the One True Separator is nonsense.
The fact that my hint and explanations about inherent problems with Tab and
that it's unsuited as a general recommendation (Remember, I replied to the
statement: "I [Marc] always use (and would recommend it to everyone) tab
(ASCII 9) as Field separator") shows me that this thread is going to become
a religious war again. I will abstain.
>> Among other advantages, tab is the default separator for cut, paste and nl,
Which is irrelevant for Awk (and comp.lang.awk), since in Awk Tab is not
the default. <OT> Actually, the way that (e.g. cut) is handling delimiters
is extremely primitive and (as opposed to awk) doesn't work for arbitrary
delimiters is more of a pain that an enlightenment. Obviously there *is*
an argument for Tab; to be able to process the same files with other tools
than Awk as well, and since those tools are so primitive you need to resort
to a very primitive one-character width delimiter. </OT> But one insight
that experienced Awk users should have is that if you use Awk you can avoid
single or pipes of versions of those primitive tools and be better off.
>> and in a terminal or text editor it is simple to
>> display tab-separated columns clearly (with 'tabs' command, or with tab length in
>> an editor).
Since in editors there's no one static configuration that is rather a problem
than an advantage (as illustrated above).
<OT>
>> The hardest part in converting a CSV to a TSV is working out how double quotes
>> were used in the CSV - there are too many different cases!
There's no doubt in problems with CSV; we seem to have agreed on that. There's
even more problems with it; e.g. it's non-standardized, there are many "CSV"
formats, or escaping (of quotes or delimiters) inside the data, an incoherent
across existing versions. Substituting a comma or semicolon delimiter in CSV
data by a Tab is not solving the inherent problems of those CSV formats.
I don't think that converting CSV is the way to go; since if you can convert
it reliably you can also just process it the same way the converter does.
With an explicit converter you may think to spare duplicating effort, but
given that there's no single standard (as mentioned already) it's better to
create data in advance in an appropriate format.
</OT>
Luckily we have (as opposed to those primitive Unix tools) the option to
specify powerful separators in Awk to handle most cases of Real World Data.
Janis