Hi all, I am a biologist trying to learn python. Please help me for the below code.
Number of files can be more than 50
>>> file1
cNo gene year
1 A,B 2004
2 C,D 2008
3 K,L 2011
>>> file2
cNo gene year
1 a,e 2001
2 d,c,p 2003
3 x,y,x 2000
4 m,n 1988
>>> file3
cNo gene year
1 R,S 2002
2 X 2005
3 A,Q 2002
Condition: compare gene of each file among all and find common/partial common 'genes' and respective 'name' and 'cNo'
final_output
name cNo genes
file1,file2,file3 1,1,3 [A,B], [a,e], [A,Q]
file1,file2 2,2 [C,D], [d,c,p]
file2,file3 3,2 [x,y,z], [X]
import pandas as pd
import functools
file1 = pd.DataFrame({'cNo':[1,2,3], 'gene': ['A,B','C,D','K,L'],'year':[2004,2008,2011]})
file2 = pd.DataFrame({'cNo':[1,2,3,4],'gene':['a,e','d,c,p','x,y,x','m,n'],'year':[2001,2003,2000,1988]})
file3 = pd.DataFrame({'cNo':[1,2,3],'gene':['R,S','X','A,Q'],'year':[2002,2005,2002]})
files = [file1, file2, file3]
#I don't know how to merge by partial matching of column
df = functools.reduce(lambda left,right: pd.merge(left,right,on='genes'), files)
print (df)