I have two tab-delimited flat files. Both have column x coordinate and y
coordinate. I want to join them by using their coordinate as the key
(which should include both x and y coordinate). Is there way to do this
using shell commands sort and join, etc? Thanks a lot! (only those
coordinates in file1 will be appeared in final merged results). -Yong
File 1:
x y number1
1 2 343.5
1 3 444.2
0 1 33.4
File 2:
x y number2
0 1 333
1 2 444
1 3 555
0 8 666
Merge results:
x y number1 number2
1 2 343.5 444
1 3 444.2 555
0 1 33.4 333
1. Shell:
- read and print 'x', 'y', 'number1'
- grep "^$x[ \t]+$y[ \t]" file2
- print out 'number2'
2. Awk:
- read file1 and store 'x,y':'number1' (key:value) in associative
array 'var1'
- read file2 and store 'x,y':'number2' (key:value) in associative
array 'var2'
- for each 'x,y' key in var1, print var1["x,y"] and var2["x,y"]
3. Python:
- read file1 and file2, and store '(x,y)':'number' in dictionary
'var1' and 'var2'.
- for each '(x,y)' key in var1, print var1[(x,y)] and var2[(x,y)]
--
William Park, Open Geometry Consulting, <openge...@yahoo.ca>
Linux solution for data management and processing.
Assume 2 files are sorted, you can try
grep "^`cut -d' ' -f 1,2 FILE1`" | awk '{print $3}' > FILE3
paste FILE1 FILE3
You'll have to first merge the key fields
T=`printf '\t'`
join -t"$T" <(tail +2 $file1 | sed "s/$T/,/" | sort) \
<(tail +2 $file2 | sed "s/$T/,/" | sort) \
| sed "s/,/$T/"
if your shell doesn't have process substitution (<(...)), use
fifos or tempfiles.
--
Stéphane
#!/usr/local/bin/perl
open(FA,"< fileA");
open(FB,"< fileB");
chomp(my @A=<FA>);
chomp(my @B=<FB>);
$f{join ",",(split)[0,1]} = (split)[-1] for (@B);
print map {$_,"\t",$f{join ",",(split)[0,1]},"\n"} grep {exists
$f{join ",",(split)[0,1]}} @A;
perl -lane '$f{"$F[0],$F[1]"}=$F[2] if(@ARGV);
$k="$F[0],$F[1]";print$_,"\t",$f{$k} if(exists$f{$k}&&!@ARGV);' fileB fileA