How to combine more than one df on the basis of partial match of column values

4 views
Skip to first unread message

himan...@gmail.com

unread,
Feb 7, 2020, 8:50:36 AM2/7/20
to Python Programming for Autodesk Maya


Number of files can be more than 50

file1 = pd.DataFrame({'cNo':[1,2,3], 'gene': ['A,B','C,D','K,L'],'year':[2004,2008,2011]})
file2 = pd.DataFrame({'cNo':[1,2,3,4],'gene':['a,e','d,c,p','x,y,x','m,n'],'year':[2001,2003,2000,1988]})
file3 = pd.DataFrame({'cNo':[1,2,3],'gene':['R,S','X','A,Q'],'year':[2002,2005,2002]})

Condition: compare gene of each file among all and find common/partial common 'genes' and respective 'name' and 'cNo'?

final_output

name cNo      genes
file1,file2,file3 1,1,3     [A,B], [a,e], [A,Q]
file1,file2 2,2      [C,D], [d,c,p]
file2,file3 3,2      [x,y,z], [X]


<My code>
import pandas as pd
import functools

files = [file1, file2, file3]

#I don't know how to merge by partial matching of column
df = functools.reduce(lambda left,right: pd.merge(left,right,on='genes'), files)
print (df)
Reply all
Reply to author
Forward
0 new messages