Falcon

10 views
Skip to first unread message

jwa...@uliege.be

unread,
Aug 5, 2019, 9:37:48 AM8/5/19
to marathon...@googlegroups.com
Dear Marathon Team,

I'm currently trying to use your system based on your very useful tuto for section 4.3.2. Running FALCON for allele-specific copy number profiling.

I generated my vcf files with gatk as showed and I read it with readVCFforCanopy without any issues. As explained in the tuto, those files are supposed to be input for falcon.
However, when I have a look to the input you use for falcon (relapse.demo), it absolutely doesn't looks like what readVCFforCanopy gives.
For example, readVCFforCanopy doesn't contains stop_position but this information is required for Falcon output.

How do you generate you falcon input (relapse.demo)?

Thank you very much for your help and have a nice day,


Jérôme Wayet
PhD Student
BLV Group, Animal Genomic
GIGA
+32 4 366 26 24

Yuchao Jiang

unread,
Aug 7, 2019, 1:00:27 AM8/7/19
to jwa...@uliege.be, Gene Urrutia, marathon...@googlegroups.com
Hi Jerome.

Thanks for your interest. Gene Urrutia cc’ed here developed that part of the script. Gene, could you please help with this?

Thanks,
Yuchao
> --
> You received this message because you are subscribed to the Google Groups "MARATHON_genomics" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to marathon_genom...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/marathon_genomics/473261807.22667929.1565012268187.JavaMail.zimbra%40uliege.be.

Gene Urrutia

unread,
Aug 7, 2019, 8:05:51 AM8/7/19
to Yuchao Jiang, jwa...@uliege.be, marathon...@googlegroups.com
Yes, I'll take a look shortly

Gene Urrutia

unread,
Aug 8, 2019, 3:04:12 PM8/8/19
to jwa...@uliege.be, marathon...@googlegroups.com
Hi Jérôme,

Thanks for your interest in MARATHON

falcon requires 4 columns as input, thus relapse.demo needs to be preprocessed prior to use with Falcon.  Intermediate steps are QC procedures which can be reproduced using the output of readVCFforCanopy.  In no step of the QC procedure is stop_position required.

> help(getChangepoints)

getChangepoints {falcon}
Usage
getChangepoints(readMatrix, verbose=TRUE, pOri=c(0.49,0.51), error=1e-5, maxIter=1000)
Arguments
. . .
readMatrix
A data frame with four columns: AN, BN, AT, BT.

Following the MARATHON tutorial we see that the input to falcon is:

> head(readMatrix)
        Tumor_ReadCount_Ref Tumor_ReadCount_Alt Normal_ReadCount_Ref Normal_ReadCount_Alt
3360063                  18                  35                   13                   26
3360064                  52                  13                   47                   11
3360067                  25                   7                   35                    6
3360068                  22                  18                   32                   13
3360069                  31                  10                   38                   13
3360070                  40                  11                   36                   18


Now if you use readVCFforCanopy, you will generate the similar readMatrix if we subselect to the first two samples

ReadsMatrix is a matrix where rows are positions and each sample has two columns, first the reference allele count, and second the alternate allele count.

> vcfFile = system.file("extdata", "sample_w_header.vcf", package="MARATHON")
> canopyInput = readVCFforCanopy(vcfFile)
> head(canopyInput$ReadsMatrix[,1:4])
              SRR5906250_Ref SRR5906250_Alt SRR5906251_Ref SRR5906251_Alt
chrM:152_T/C               9             54              1            192
chrM:195_C/T               0             59              0            241
chrM:410_A/T               0            210              0            250
chrM:2354_C/T              0            108              0            250
chrM:2485_C/T              0            170              0            250
chrM:2759_A/G              5             85              4            244

Please let me know if you have any additional questions,
Best,
Gene

Gene Urrutia

unread,
Aug 9, 2019, 9:24:17 AM8/9/19
to jwa...@uliege.be, marathon...@googlegroups.com
Hi Jérôme, I looked more closely and I agree with you that it would be helpful to generate a dataset from the VCF that looks like relapse.demo.  Then it could plug into all QC procedures.  I will work on this.
Gene

Gene Urrutia

unread,
Aug 12, 2019, 11:20:06 AM8/12/19
to jwa...@uliege.be, marathon...@googlegroups.com
added readVCFforFalcon which should be sufficient to run the QC steps

Please update to the latest version of MARATHON via github
devtools::install_github("yuchaojiang/MARATHON/package")

Documentation via help(readVCFforFalcon) and section 4.3.1 of the notebook

Please let me know if there's anything else needed.

Gene

Reply all
Reply to author
Forward
0 new messages