Dear Yuchao Jiang,
I have come across your MARATHON pipeline and think that this will be a very useful tool for studying tumor evolution in our setting. We have exome sequencing data from several sections of the same tumors and want to check them for similar subclones. The Canopy
tool looks very promising and we would like to use it for our analysis.
However, I am struggling how to get all the numbers hat I need for running Canopy out of my files.
I have followed the MARATHON documentation as far as I could. However, I am still missing some steps.
If I have the BAM files, the somatic SNV calls in VCF format and the germline VCF format. I also have separate copy-number variation calls in BED format if they are of any use.
I am sure that packages like VariantAnnotation and other Bioconductor packages can be used to extract the numbers that I need for Canopy, but this seems a bit tricky and seems not to be explained in the MARATHON documentation.
It is clear that MARATHON still is in development, but I think it would be very useful to if the documentation contained complete description or even script how to obtain the required numbers for Canopy if just these files are present.
So any hints or suggestions on how to obtain the required numbers from the VCF files would be very useful.
Thank you very much in advance,
Best regards,
Joern Toedling
--
Joern Toedling, PhD
Charité - Universitätsmedizin Berlin
CC17, AG Schulte
CVK, Forum 4, Raum 0.0207
Augustenburger Platz 1, D-13353 Berlin
Tel: +49 30 450 616198