P V Jithesh

Jun 23, 2018, 11:22:21 AM6/23/18
to gemini-variation

I have loaded a VCF into gemini after annotation with SnpEff, which I am able to query without any problem. Then I tried to use gemini annotate to add annotations from HGMD PRO. I am able to get all fields except the PHEN which throws the following error. Suggestions highly appreciated.


gemini annotate -f HGMD_PRO_2017.4_hg19.vcf.gz -o list -e PHEN -c hgmd_phen -t text test.db

Traceback (most recent call last):

  File "/usr/local/bin/gemini", line 7, in <module>


  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 1248, in main

    args.func(parser, args)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 637, in annotate_fn

    gemini_annotate.annotate(parser, args)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 362, in annotate

    annotate_variants_extract(args, conn, metadata, col_names, col_types, col_ops, col_idxs)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 283, in annotate_variants_extract

    col_names, col_types, col_ops)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 116, in _annotate_variants

    _update_variants(metadata, to_update, col_names, cursor)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/", line 137, in _update_variants

    cursor.execute(stmt, [mkdict(v) for v in to_update])

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 948, in execute

    return meth(self, multiparams, params)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/", line 269, in _execute_on_connection

    return connection._execute_clauseelement(self, multiparams, params)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 1060, in _execute_clauseelement

    compiled_sql, distilled_params

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 1200, in _execute_context


  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 1413, in _handle_dbapi_exception


  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/util/", line 203, in raise_from_cause

    reraise(type(exception), exception, tb=exc_tb, cause=cause)

  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 1170, in _execute_context


  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/", line 504, in do_executemany

    cursor.executemany(statement, parameters)

sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. [SQL: u'UPDATE variants SET hgmd_phen=? WHERE variants.variant_id = ?'] [parameters: (('"Ovarian_cancer_epithelial_reduced_risk"', '6174'), ('"Inflammatory_bowel_disease_association_with"', '6184'), ('"Myocardial_infarction_protection_against_association"', '7568'), ('"Differential_p53_binding"', '8330'), ('"Taste_sensitivity_to_sucrose_association_with"', '8480'), ('"Taste_sensitivity_to_sucrose_association_with"', '8484'), ('"Insensitivity_to_glutamate_taste_association_with"', '8505'), ('"Gastropathy_M\xc3\xa9n\xc3\xa9trier-like"', '10667')  ... displaying 10 of 5001 total bound parameter sets ...  ('"Crohn\'s_disease_susceptibility_to_association_with"', '13476488'), ('"Crohn\'s_disease_susceptibility_to_association"', '13476501'))] (Background on this error at:

Andrew O

May 24, 2019, 6:27:17 PM5/24/19
to gemini-variation

Not sure if you solved this already, but I recently ran into the same problem.  The culprit in my case was a variant with a PHEN value that had a PRIME symbol:


Here's what I did to fix it:

zcat hgmd_pro_2019.1_hg19.vcf.gz | perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' | bgzip > hgmd_pro_2019.1_hg19.tidy.vcf.gz && tabix hgmd_pro_2019.1_hg19.tidy.vcf.gz

If you use the "tidy" VCF file the error goes away.

If not helpful for you, hopefully this can help someone else who reads this.

