Dear Owen and Vladimir
Thanks a lot for your guidance.
The "blank down" route is giving me an out of heap memory issue every time I try it though I'm using an i7/16 GB RAM/1TB SSD laptop (as apprehended by Owen in his reply).
The solution of the unix command tool is really doing the trick within a few minutes. Amazingly the 15.41 million rows is now 19,312 unique rows.
I would like to know something more on this approach -
My final work will have 20 sets of tsv files (large files of 1.5 GB to 2.5 GB) with two columns (URI - Descriptors and with header), and then how to handle these 20 sets of tsv files with headers.
In this experiment I first exported two sets without column headers in tsv format, then use 'cat' command to merge them, and finally issued the sort <file> | uniq > <uniq-file>.
Is there any way to merge all 20 large tsv files with headers, and then use the uniq command?
Second -
Is there any way to display/store output of the command sort <file> | uniq -c sorted by number of facet counts (reverse sort will be very useful for my case)?
man uniq command is giving any clue on this.
Best regards