Issue with vcfanno

75 views
Skip to first unread message

Moiz Bootwalla

unread,
Jul 3, 2018, 5:59:19 PM7/3/18
to biovalidation
Hello bcbio team,

I am trying to run a workflow through bcbio and it keeps stalling at the vcfanno part. I have been trying to figure out what the issue might be and it seems like a combination of couple of things. Before I list what I found here's the error log from bcbio:

Traceback (most recent call last):

  File "/shared/software/bcbio_nextgen/1.0.9/bin/bcbio_nextgen.py", line 241, in <module>

    main(**kwargs)

  File "/shared/software/bcbio_nextgen/1.0.9/bin/bcbio_nextgen.py", line 46, in main

    run_main(**kwargs)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 53, in run_main

    fc_dir, run_info_yaml)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel

    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 164, in variant2pipeline

    samples = run_parallel("postprocess_variants", samples)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/bcbio/distributed/ipython.py", line 137, in run

    for data in view.map_sync(fn, items, track=False):

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/view.py", line 344, in map_sync

    return self.map(f,*sequences,**kwargs)

  File "<decorator-gen-137>", line 2, in map

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/view.py", line 52, in sync_results

    ret = f(self, *args, **kwargs)

  File "<decorator-gen-136>", line 2, in map

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/view.py", line 37, in save_ids

    ret = f(self, *args, **kwargs)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/view.py", line 1114, in map

    return pf.map(*sequences)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 299, in map

    return self(*sequences, __ipp_mapping=True)

  File "<decorator-gen-119>", line 2, in __call__

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 80, in sync_view_results

    return f(self, *args, **kwargs)

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/remotefunction.py", line 285, in __call__

    return r.get()

  File "/gpfs/fs1/data/bcbio_data_1.0.9/anaconda/lib/python2.7/site-packages/ipyparallel/client/asyncresult.py", line 169, in get

    raise self.exception()

ipyparallel.error.CompositeError: one or more exceptions from call to method: postprocess_variants

[9:apply]: CalledProcessError: Command 'set -o pipefail; /gpfs/fs1/data/bcbio_data_1.0.9/anaconda/bin/vcfanno -p 16 -lua /gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5/config/vcfanno/gemini.lua -base-path /gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5 /gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5/config/vcfanno/gemini.conf /gpfs/fs1/SES/Validation_HiSeq/180410_K00187_0035_AHMFM3BBXX/bcbio_20180410/work/mutect2/CPM00004824-F-D_20171030-annotated.vcf.gz |  bgzip -c > /gpfs/fs1/SES/Validation_HiSeq/180410_K00187_0035_AHMFM3BBXX/bcbio_20180410/work/bcbiotx/tmp_jeUXd/CPM00004824-F-D_20171030-annotated-annotated-gemini.vcf.gz

=============================================

vcfanno version 0.2.8 [built with go1.8]

see: https://github.com/brentp/vcfanno

=============================================

vcfanno.go:112: [Flatten] unable to open file: /gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5//gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5/ExAC.r0.3.sites.vep.tidy.vcf.gz in /gpfs/fs1/data/bcbio_data_1.0.9/genomes/Hsapiens/hs37d5

' returned non-zero exit status 1



It looks like vcfanno is looking for annotation files under the genomes/Hsapiens/GENOME directory but all these annotation files are under bcbio_data_1.0.9/gemini_data directory. Can you kindly help me fix this error?


Thanks,


Moiz

Brad Chapman

unread,
Jul 4, 2018, 11:25:06 AM7/4/18
to Moiz Bootwalla, biovalidation

Moiz;
Thanks for the detailed report and apologies about the issue. It looks like
you're using a custom installed human build 37 genome (hs37d5) and bcbio isn't
correctly detecting it's build 37 to insert the location of the gemini data.
GEMINI in hg19/GRCh37 gets treated specially, while in hg38 it's more a
standard part of the integration.

I've pushed a fix that I hope will resolve this expanding the current checks
(which just look for genome names) to also include looking for any of the hg19
or GRCh37 specific GL contigs within the reference.

If you update to the latest development (`bcbio_nextgen.py upgrade -u
development`) and re-run I hope this will correctly insert the gemini_data
directory as part of the vcfanno run.

Thanks again,
Brad
> --
> You received this message because you are subscribed to the Google Groups "biovalidation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
> To post to this group, send email to bioval...@googlegroups.com.
> Visit this group at https://groups.google.com/group/biovalidation.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biovalidation/94dbc55f-71a5-434a-a3ac-7d1bf8d24681%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Moiz Bootwalla

unread,
Jul 6, 2018, 5:14:24 PM7/6/18
to Brad Chapman, biovalidation
Hi Brad,

Thank you for providing the fix so quickly. I restarted my job and it looks like the fix is working. The job is still running. I will report back if it runs into any further errors.

Best Regards,

Moiz

Moiz Bootwalla

unread,
Jul 9, 2018, 2:49:48 PM7/9/18
to biovalidation
Hi Brad,

Following up on this. The fix you implemented worked great. The annotation and prioritization steps worked perfectly and I was able to get the final output as well. I was going through the logs and I kept seeing the following error/warning from vcfanno over and over again and am trying to figure out what this means. I'm pasting the output below:

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:115: found 49 sources from 17 files

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: =============================================

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno version 0.2.8 [built with go1.8]

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: see: https://github.com/brentp/vcfanno

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: =============================================

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:115: found 49 sources from 17 files

[2018-07-06T21:04Z] chla-cpm-cn05.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn02.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn01.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn10.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn07.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn05.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:187: Info Error: max_aaf_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:187: Info Error: clinvar_sig not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn03.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn03.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn10.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn01.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn05.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn12.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn06.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn06.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn07.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:187: Info Error: clinvar_sig not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:187: Info Error: max_aaf_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:187: Info Error: af_esp_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn02.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn12.chla-cpm.cluster: vcfanno.go:187: Info Error: af_1kg_all not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn10.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn03.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn07.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn02.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn12.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn06.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn09.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn11.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...

[2018-07-06T21:04Z] chla-cpm-cn01.chla-cpm.cluster: vcfanno.go:187: Info Error: af_adj_exac_sas not found in INFO >> this error/warning may occur many times. reporting once here...


Any idea as to what this error/warning means? 

Thanks,

Moiz

> To unsubscribe from this group and stop receiving emails from it, send an email to biovalidation+unsubscribe@googlegroups.com.

Brad Chapman

unread,
Jul 9, 2018, 4:34:36 PM7/9/18
to Moiz Bootwalla, biovalidation

Moiz;
Glad to hear it worked for you. Sorry for any confusion with the warnings.
Those are expected warnings for positions where a variant does not overlap
with ESP, 1000 genomes or other resources. There is a post annotation,
`max_aaf_all` that looks for all these and reports the maximum. This is useful
as a general metric, but unfortunately makes vcfanno noisy. So, nothing to
worry about here and sorry about the noise,
Brad
>>> an email to biovalidatio...@googlegroups.com.
>>> > To post to this group, send email to bioval...@googlegroups.com.
>>> > Visit this group at https://groups.google.com/group/biovalidation.
>>> > To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/biovalidation/94dbc55f-71a5-434a-a3ac-7d1bf8d24681%40googlegroups.com
>>> .
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "biovalidation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
> To post to this group, send email to bioval...@googlegroups.com.
> Visit this group at https://groups.google.com/group/biovalidation.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biovalidation/5abd7652-8b4a-4bbb-aed2-0899acdb2e0b%40googlegroups.com.

Moiz Bootwalla

unread,
Jul 9, 2018, 4:35:31 PM7/9/18
to Brad Chapman, biovalidation
That is perfect.

Thank You,

Moiz
Reply all
Reply to author
Forward
0 new messages