Deep variant can be used to analyze viral sequences or simply viruses or RNA

99 views
Skip to first unread message

hze...@gmail.com

unread,
Feb 12, 2021, 3:03:53 PM2/12/21
to GCP Life Sciences Discuss
Dear GCP life sciences team ,
hi I have question about deep variant  that  it can be used for viruses sequences data
tell me about deep variant complete steps  is it tutorial named Running deep variant 
or there is some another life sciences analysis pipeline for viruses
for example Claire or cliaryovante  
tell me about this 

BIN_VERSION="1.1.0"
BASE="${HOME}/deepvariant-run"
INPUT_DIR="${BASE}/input"
REF="GRCh38_no_alt_analysis_set.fasta"
BAM="HG003.novaseq.pcr-free.35x.dedup.grch38_no_alt.chr20.bam"

OUTPUT_DIR="${BASE}/output"
DATA_DIR="${INPUT_DIR}/data"
OUTPUT_VCF="HG003.output.vcf.gz"
OUTPUT_GVCF="HG003.output.g.vcf.gz"


 Input BAM and BAI files:
gsutil cp gs://deepvariant/case-study-testdata/"${BAM}" "${DATA_DIR}"
gsutil cp gs://deepvariant/case-study-testdata/"${BAM}".bai "${DATA_DIR}"

# GRCh38 reference FASTA file:
FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > "${DATA_DIR}/${REF}"
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > "${DATA_DIR}/${REF}".fai


so if we have fast file bam file  ,bam.bai file  Refernce fasta file so can we run deep variant
what is REF="GRCh38_no_alt_analysis_set.fasta" 
no-alt-analysis is this just name
and
"HG003.novaseq.pcr-free.35x.dedup.grch38_no_alt.chr20.bam"
how these files can be created if viral data is given
what is  novaseq pcr free 35x dedup grch38 -no-alt.chr20.bam

bam and bai files can be created via samtools
fasta files are also available

but the question remains is  the deep variant can be used for viral sequences
if fasta reference sequence
and bam file
bai file
or any bam file irrespective of chr20


thanks in advance 
haroon zeb







hze...@gmail.com

unread,
Feb 12, 2021, 3:13:59 PM2/12/21
to Saman Vaisipour, gcp-life-sci...@googlegroups.com

Andrew Carroll

unread,
Feb 13, 2021, 5:29:47 AM2/13/21
to GCP Life Sciences Discuss
Hello Haroon,

Thank you for your question. DeepVariant was developed with diploid variant calling in mind, so use on viral sequences is somewhat different from its intended use case. Whether DeepVariant can be used will depend on whether you want to identify major variants (those present in multiple samples), or whether you want to identify subclonal variants. DeepVariant will be able to identify major variants, but has not been designed to identify subclonal variants.

I have not conducted benchmarking on this problem. My instinct is that if you are working with PacBio HiFi data or Oxford Nanopore data, DeepVariant should do better on this problem (mostly because some of factors of mappability and duplicated regions more of an issue with Illumina calling, and DeepVariant would have learned how to deal with some of those factors which don't apply to viral callling). 


For Nanopore, please use PEPPER-DeepVariant (https://github.com/kishwarshafin/pepper/blob/r0.1/docs/PEPPER_variant_calling.md). In addition to Clair, which you mention, this paper ((https://www.nature.com/articles/s41467-020-20075-6) assesses Medaka (https://github.com/nanoporetech/medaka) and Nanopolish (https://github.com/jts/nanopolish) specifically for the viral calling problem 

For Illumina data, DeepVariant may give competitive results for this problem. You may also want to consider viral-ngs (https://github.com/broadinstitute/viral-ngs) which is an analytical pipeline developed by the Broad specifically for viral analysis. I am not sure whether this will generalize for all viruses and it might be a little complicated to use. However, a large amount of specialized viral genomics knowledge has gone into its development.

I hope this information is helpful, please let me know if you have other questions.

Thank you,
Andrew
Reply all
Reply to author
Forward
0 new messages