Error annotating with HGMD PRO

68 views
Skip to first unread message

P V Jithesh

unread,
Jun 23, 2018, 11:22:21 AM6/23/18
to gemini-variation
Hello,

I have loaded a VCF into gemini after annotation with SnpEff, which I am able to query without any problem. Then I tried to use gemini annotate to add annotations from HGMD PRO. I am able to get all fields except the PHEN which throws the following error. Suggestions highly appreciated.

Thanks
Jithesh

gemini annotate -f HGMD_PRO_2017.4_hg19.vcf.gz -o list -e PHEN -c hgmd_phen -t text test.db


Traceback (most recent call last):

  File "/usr/local/bin/gemini", line 7, in <module>

    gemini_main.main()

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1248, in main

    args.func(parser, args)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 637, in annotate_fn

    gemini_annotate.annotate(parser, args)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_annotate.py", line 362, in annotate

    annotate_variants_extract(args, conn, metadata, col_names, col_types, col_ops, col_idxs)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_annotate.py", line 283, in annotate_variants_extract

    col_names, col_types, col_ops)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_annotate.py", line 116, in _annotate_variants

    _update_variants(metadata, to_update, col_names, cursor)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_annotate.py", line 137, in _update_variants

    cursor.execute(stmt, [mkdict(v) for v in to_update])

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute

    return meth(self, multiparams, params)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection

    return connection._execute_clauseelement(self, multiparams, params)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement

    compiled_sql, distilled_params

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context

    context)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception

    exc_info

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause

    reraise(type(exception), exception, tb=exc_tb, cause=cause)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1170, in _execute_context

    context)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 504, in do_executemany

    cursor.executemany(statement, parameters)

sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. [SQL: u'UPDATE variants SET hgmd_phen=? WHERE variants.variant_id = ?'] [parameters: (('"Ovarian_cancer_epithelial_reduced_risk"', '6174'), ('"Inflammatory_bowel_disease_association_with"', '6184'), ('"Myocardial_infarction_protection_against_association"', '7568'), ('"Differential_p53_binding"', '8330'), ('"Taste_sensitivity_to_sucrose_association_with"', '8480'), ('"Taste_sensitivity_to_sucrose_association_with"', '8484'), ('"Insensitivity_to_glutamate_taste_association_with"', '8505'), ('"Gastropathy_M\xc3\xa9n\xc3\xa9trier-like"', '10667')  ... displaying 10 of 5001 total bound parameter sets ...  ('"Crohn\'s_disease_susceptibility_to_association_with"', '13476488'), ('"Crohn\'s_disease_susceptibility_to_association"', '13476501'))] (Background on this error at: http://sqlalche.me/e/f405)

Andrew O

unread,
May 24, 2019, 6:27:17 PM5/24/19
to gemini-variation
Hi,

Not sure if you solved this already, but I recently ran into the same problem.  The culprit in my case was a variant with a PHEN value that had a PRIME symbol:

PHEN=Altered_2′-O-methyluridine_ratio

Here's what I did to fix it:

zcat hgmd_pro_2019.1_hg19.vcf.gz | perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' | bgzip > hgmd_pro_2019.1_hg19.tidy.vcf.gz && tabix hgmd_pro_2019.1_hg19.tidy.vcf.gz

If you use the "tidy" VCF file the error goes away.

If not helpful for you, hopefully this can help someone else who reads this.

Andrew
Reply all
Reply to author
Forward
0 new messages