Allele frequency not seen

21 views
Skip to first unread message

Tulasi S

unread,
Oct 27, 2022, 11:38:34 AM10/27/22
to cBioPortal for Cancer Genomics Discussion Group
Hello Team,
The allele frequency that has to be seen in the mutation tab is not seen for our data set. We would like to know what should the format of vcf should be and where should the allele frequency values be put such that it is seen in the mutations tab. We actually also have column with the header name as AF in the maf generated from VCF2MAF conversion but still its is not seen in our local installation of cBioportal. Kindly requesting you to help me with the same.

Thanks and Regards,
Tulasi S

Tulasi S

unread,
Nov 1, 2022, 9:17:34 AM11/1/22
to cBioPortal for Cancer Genomics Discussion Group
Hello team,
Is there any update or insight on the same?

Thanks and Regards
Tulasi S

--
You received this message because you are subscribed to a topic in the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cbioportal/qpL-qv3cX5c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/9b93bf03-a066-42ab-89cd-b77be44f5a08n%40googlegroups.com.

JJ Gao

unread,
Nov 2, 2022, 7:29:21 PM11/2/22
to Tulasi S, cBioPortal for Cancer Genomics Discussion Group
Hi Tulasi,

Sorry for the late reply.

t_alt_count and t_ref_count columns are needed for allele frequencies as described here: https://docs.cbioportal.org/file-formats/#minimal-maf-file-format

Best,
-JJ

You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/CA%2BKQo3bTVY%3Ddw3MdQ2_ArhOks%2BwvPA3%2B%3DU%3DuU7cNN%3D7NhboOeQ%40mail.gmail.com.

JJ Gao

unread,
Nov 4, 2022, 4:51:35 PM11/4/22
to Tulasi S, cBioPortal for Cancer Genomics Discussion Group, Cyriac Kandoth
Hi Tulasi,

I am cc'ing Cyriac, the author of VCF2MAF and see if he has an answer to this.

-JJ

On Fri, Nov 4, 2022 at 2:43 AM Tulasi S <tula...@strandls.com> wrote:
Hello JJ,
Thanks for the reply, In the VCF files we have the values for all the below mentioned tags (GT:GQ:AD:DP:VF:NL:SB:NC:US:AQ:LQ) but still the t_ref_count and t_alt_count columns are empty. What should I do to get those columns filled?

Below is the call we use for vcf2maf conversion:
perl $vcftomaf --input-vcf $path --output-maf $output/$filename.vep.maf --tmp-dir /data/shrewd/data-files/tmp --tumor-id STRAN-2022-29744 --normal-id CNTRL-0000103-PRC-0022961-S16 --vcf-tumor-id TUMOR --vcf-normal-id NORMAL

The VCF is converted to MAF but still the t_ref count and t_alt_count is not filled. Can you please advise me on how to get those columns filled?

Thanks and regards
Tulasi S

Cyriac Kandoth

unread,
Nov 4, 2022, 6:11:35 PM11/4/22
to JJ Gao, Tulasi S, cBioPortal for Cancer Genomics Discussion Group
Hi Tulasi, the following arguments you used will look for genotype columns in your VCF named "TUMOR" and "NORMAL", and parse out allele depths from the rows, usually AD and DP.

--vcf-tumor-id TUMOR --vcf-normal-id NORMAL

Make sure that these are the correct names of the genotype columns.

If this does not solve the problem, please include a sample of your VCF to help us debug.

~Cyriac

Tulasi S

unread,
Nov 7, 2022, 2:17:06 AM11/7/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Hello Cyriac,
Thanks for your reply. When the VCF2MAF conversion is done I'm not getting the values for the t_depth, t_ref_count, t_alt_count, n_depth, n_ref_count, n_alt_count columns. Please find the attached sample vcfs one for tumor sample (vcf1) and control sample (vcf2) for your reference in the drive link given below and kindly help me on how to get these columns filled as they are required for calculating the allele frequency.



Thanks and Regards,
Tulasi S 

Tulasi S

unread,
Nov 14, 2022, 6:41:16 AM11/14/22
to Cyriac Kandoth, cBioPortal for Cancer Genomics Discussion Group
Hello team,
This is a gentle reminder and also wanted to check if there is
any update on this issue on how to get the t_ref and t_alt columns filled?

Thanks & Regards,
Tulasi S

On Thu, Nov 10, 2022, 1:26 PM Tulasi S <tula...@strandls.com> wrote:
Hello Cyriac,
This is a gentle reminder for the above mail. Is there any update for the same?
Eager to hear from you.

Thanks and Regards,
Tulasi S

Cyriac Kandoth

unread,
Nov 14, 2022, 1:08:42 PM11/14/22
to Tulasi S, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Hi Tulasi. The files you shared are not VCFs. They are "Open Document Spreadsheet" (ODS) format. Please confirm that you are using proper plain-text "Variant Call Format" (VCF) with vcf2maf.

I renamed your '.vcf" files to ".ods" and opened them in Excel. There are several issues here, but most importantly, you do not have a field named "AD" that usually contains the values for "t_ref_count" and "t_alt_count". Instead, "t_alt_count" is in an INFO field named "ADP" and "t_ref_count" must be calculated manually by subtracting "ADP" from "DP". For the sake of time, you can do something this to create a MAF compatible with cBioPortal:

perl vcf2maf.pl --input-vcf vcf1.vcf --output-maf vcf1.vep.maf --tumor-id XYZ --retain-info ADP

This should produce MAFs with an extra column named "ADP" which you will need to manually move into column "t_alt_count". Then you must also calculate "t_ref_count" by subtracting "t_depth" from "t_alt_count". If you want to avoid this manual work, then you need to modify your upstream pipeline that generates VCFs to include an "AD" field with allele depths.

~Cyriac

Tulasi S

unread,
Nov 14, 2022, 1:30:12 PM11/14/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Hello Cyriac,
I would like to confirm that we are using the proper VCF and since I had to share only a few lines of the VCF, I had edited them in the excel sheet and that is why I had uploaded them in that format.

Thanks for the clarification on the sample VCFs I had shared. I will surely try what you have suggested. We actually have different panels and the VCFs shared earlier are from one panel. We also have another panel containing VCFs, having these GT:GQ:AD:DP:VF:NL:SB:NC:US:AQ:LQ tags. I will add the sample VCFs for this panel in the same drive (https://drive.google.com/drive/folders/1kViq3E6ZDCd6SQjzc8SKUm5bTjhu-km_?usp=share_link). Can you please help me on how to fill the t_alt_count if I have the values of these tags?

Thanks and Regards,
Tulasi S

Cyriac Kandoth

unread,
Nov 14, 2022, 2:03:19 PM11/14/22
to Tulasi S, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
This new VCF you sent contains the "AD" field listing "t_ref_count" and "t_alt_count" separated by a comma. For vcf2maf to parse this out, you need to specify the name of the genotype column it should look under i.e. "--tumor-id TGR_2141S_DNA-DNA". Note that VCFs can list any number of genotype columns. Ideally, the same data on your normal control sample will also be in the same input VCF. But allele depths from the normal don't drive any useful feature in the cBioPortal.

~Cyriac

Tulasi S

unread,
Nov 15, 2022, 2:16:06 AM11/15/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Thanks a lot Cyriac. I had tried the vcf2maf conversion for the VCF which had the AD values and hence I was able to get the t_ref and t_alt columns filled.

I would like to just clarify one more thing for the other panel which I had sent earlier without the AD values. I also do not have the ADP values which can be retained using --retain-info. Will I still be able to fill the t_ref and t_alt columns if I don't have AD or ADP values? If so can you please help me with the same?

Thanks and Regards,
Tulasi S

Cyriac Kandoth

unread,
Nov 15, 2022, 2:24:38 AM11/15/22
to Tulasi S, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
I had noticed those VCFs have a field named "SR" under the genotype column. You can extract this into a new MAF column using "--retain-fmt SR". The values in this is the VAF aka variant allele fraction i.e. t_alt_count / t_depth. So, you can multiply SR with DP to get t_alt_count, and the the DP minus that to get the t_ref_count.

~C

Tulasi S

unread,
Nov 15, 2022, 2:35:00 AM11/15/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
I'm sorry for the previous mail, I had missed the ADP values present in the VCFs. But we do have another panel of VCFs which do not have the AD or ADP values and for those I can try this command which you have suggested. Thank you very much for the clarification.

Regards,
Tulasi S

Tulasi S

unread,
Nov 17, 2022, 2:38:56 AM11/17/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Hello Cyriac,
Thanks for suggesting the methods for calculating t_alt count and t_ref count. I was able to successfully calculate the values for the two panels but we also have another panel that does not have AD or ADP values.
Please find attached the excel sheet where I have performed the calculation you had suggested with the SR tag. I had tried calculating the t_alt_count and t_ref_count for the row 1 variant in column J and I'm getting a negative value for the t_ref_count when calculated. Can you please check and let me know if it's correct or where I'm going wrong?

Thanks and Regards,
Tulasi S

Example vcf.ods

Cyriac Kandoth

unread,
Nov 19, 2022, 3:21:29 PM11/19/22
to Tulasi S, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Your VCF header describes "SR" as "Supporting Reads". However, it is not a whole number. So, I can only guess that it is actually a percentage. This means that if SR is 3.43%, you would use 0.0343 in your math below, and also round to a whole number.

t_alt_count = SR x DP = 0.0343 x 466 = 16
t_ref_count = t_depth – t_alt_count = 466 - 16 = 450

~Cyriac

Tulasi S

unread,
Nov 19, 2022, 7:42:13 PM11/19/22
to Cyriac Kandoth, JJ Gao, cBioPortal for Cancer Genomics Discussion Group
Thanks a lot for your clarification @Cyriac Kandoth..
Reply all
Reply to author
Forward
0 new messages