Hi Jérôme,
Thanks for your interest in MARATHON
falcon requires 4 columns as input, thus relapse.demo needs to be preprocessed prior to use with Falcon. Intermediate steps are QC procedures which can be reproduced using the
output of readVCFforCanopy. In no step of the QC procedure is
stop_position required.
> help(getChangepoints)
getChangepoints {falcon}
Usage
getChangepoints(readMatrix, verbose=TRUE, pOri=c(0.49,0.51), error=1e-5, maxIter=1000)
Arguments
. . .
readMatrix
A data frame with four columns: AN, BN, AT, BT.
Following the MARATHON tutorial we see that the input to falcon is:
> head(readMatrix)
Tumor_ReadCount_Ref Tumor_ReadCount_Alt Normal_ReadCount_Ref Normal_ReadCount_Alt
3360063 18 35 13 26
3360064 52 13 47 11
3360067 25 7 35 6
3360068 22 18 32 13
3360069 31 10 38 13
3360070 40 11 36 18
Now if you use readVCFforCanopy, you will generate the similar readMatrix if we subselect to the first two samples
ReadsMatrix is a matrix where rows are positions and each sample has two columns, first the reference allele count, and second the alternate allele count.
> vcfFile = system.file("extdata", "sample_w_header.vcf", package="MARATHON")
> canopyInput = readVCFforCanopy(vcfFile)
> head(canopyInput$ReadsMatrix[,1:4])
SRR5906250_Ref SRR5906250_Alt SRR5906251_Ref SRR5906251_Alt
chrM:152_T/C 9 54 1 192
chrM:195_C/T 0 59 0 241
chrM:410_A/T 0 210 0 250
chrM:2354_C/T 0 108 0 250
chrM:2485_C/T 0 170 0 250
chrM:2759_A/G 5 85 4 244
Please let me know if you have any additional questions,
Best,
Gene