Hello,
First off, my thanks to the creators for writing cMonkey2 as a Python package (and the beautiful graphical front-end); I appreciate how easy the software is to use.
I have questions regarding running the algorithm with various treatment groups that have vastly different count of biological replicates.
Specifically, I have ~150 RNAseq datasets for a microbe growing in 17 different media compositions; however, for 3 of the growth media I have 20-30 replicates, while for the remaining growth media I have 3-5 replicates.
My question is: Will the large number of replicates bias the algorithm to only consider correlations across the 3 highly represented growth media? If so, what are your recommendations for reducing bias in the sample set?
Thanks,
Joseph Peterson