On 2016-06-29, Janis Papanagnou <
janis_pa...@hotmail.com> wrote:
> On 29.06.2016 12:36, Ben Bacarisse wrote:
>> Janis Papanagnou <
janis_pa...@hotmail.com> writes:
>>
>>> On 29.06.2016 05:59, Sivaram Neelakantan wrote:
>>>>
>>>> I have data that looks like this
>>>>
>>>> c1 c2
>>>> ------
>>>> 1 4
>>>> 1 2
>>>> 1 5
>>>> 2 1
>>>> 4 1
>>>>
>>>> The pairwise combo of (1,4) and (4,1) are considered duplicates. As
>>>> are (1,2) and (2,1). How do I delete one of the duplicates? Doesn't
>>>> matter which row is deleted.
>>>
>>> One possibility with awk...
>>>
>>> awk 'NR<=2 || !(($1,$2) in a); { a[$1,$2] ; a[$2,$1] }' your_data_file
>>
>> Do you really need the NR<=2 test?
>
> It's purpose is to keep it obvious that the first two lines should always
> be taken unchanged. Technically speaking, you don't need it, because the
> second condition would be effective as well, but stating that condition
> explicitly is more robust in case of changes and smells less like a hack.
> I prefer to have conditions for syntactically different header lines
> semantically separated in the awk code.