Hey Guys,
I would love to use the SFS method in a distributed way.
I saw that there is a way to do it in parallel on a single-machine utilising the different cores (n_jobs=-1, like in Scikit-learn).
But, is there a way / framework to use mlxtend SFS on a number of different machines?
as part of my thesis I need to do a SFS on a data set of about 10K genes on about 100K samples and looking for a way to do it with many low cost machines via AWS.
thanks in advance for your help and thoughts.
(if there is a way to do it using Pyspark and UDF it will also be alright as I can scale out using AWS EMR)