Issue connecting to UCSC genome browser

74 views
Skip to first unread message

Ryan Kelly

unread,
Aug 1, 2022, 6:20:40 PM8/1/22
to gen...@soe.ucsc.edu
Hi,

I'm attempting to use genomepy to download the genome reference and annotation for mm10 but run into a bit of an issue. Thought this was a connection issue on UCSC's part but recent posts suggest that issue has been resolved.

Just wondering if some clarity could be provided? Is this a connection issue on my end and if so, how could i resolve it?

Thanks,

Ryan







CODE:

if not genome_installation:
    import genomepy
    genomepy.install_genome(name=ref_genome, provider="UCSC")
else:
    print(ref_genome, "is installed.")

Full Error
16:20:44 | INFO | Downloading assembly summaries from UCSC
---------------------------------------------------------------------------
MySQLInterfaceError                       Traceback (most recent call last)
File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/mysql/connector/connection_cext.py:263, in CMySQLConnection._open_connection(self)
    262 try:
--> 263     self._cmysql.connect(**cnx_kwargs)
    264     self._cmysql.converter_str_fallback = self._converter_str_fallback

MySQLInterfaceError: Unknown MySQL server host 'genome-mysql.soe.ucsc.edu' (-2)

The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)
Input In [10], in <cell line: 1>()
      1 if not genome_installation:
      2     import genomepy 
----> 3     genomepy.install_genome(name=ref_genome, provider="UCSC")
      4 else:
      5     print(ref_genome, "is installed.")

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/functions.py:203, in install_genome(name, provider, genomes_dir, localname, mask, keep_alt, regex, invert_match, bgzip, annotation, only_annotation, skip_matching, skip_filter, threads, force, **kwargs)
    201 out_dir = os.path.join(genomes_dir, localname)
    202 genome_file = os.path.join(out_dir, f"{localname}.fa")
--> 203 provider = _provider_selection(name, localname, genomes_dir, provider)
    205 # check which files need to be downloaded
    206 genome_found = _is_genome_dir(out_dir)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/functions.py:359, in _provider_selection(name, localname, genomes_dir, provider)
    356     if p in ["ensembl", "ucsc", "ncbi"]:
    357         provider = p
--> 359 return _lazy_provider_selection(name, provider)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/functions.py:330, in _lazy_provider_selection(name, provider)
    328 """return the first PROVIDER which has genome NAME"""
    329 providers = []
--> 330 for p in online_providers(provider):
    331     providers.append(p.name)
    332     if name in p.genomes:

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/__init__.py:104, in online_providers(provider)
    102 for provider in providers:
    103     try:
--> 104         yield create(provider)
    105     except ConnectionError as e:
    106         logger.warning(str(e))

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/__init__.py:60, in create(name)
     58 p = PROVIDERS[name]
     59 p.download_assembly_report = staticmethod(download_assembly_report)
---> 60 return p()

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/ucsc.py:51, in UcscProvider.__init__(self)
     49 self._provider_status()
     50 # Populate on init, so that methods can be cached
---> 51 self.genomes = get_genomes("http://api.genome.ucsc.edu/list/ucscGenomes")

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/contextlib.py:75, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     72 @wraps(func)
     73 def inner(*args, **kwds):
     74     with self._recreate_cm():
---> 75         return func(*args, **kwds)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/diskcache/core.py:1877, in Cache.memoize.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
   1874 result = self.get(key, default=ENOVAL, retry=True)
   1876 if result is ENOVAL:
-> 1877     result = func(*args, **kwargs)
   1878     if expire is None or expire > 0:
   1879         self.set(key, result, expire, tag=tag, retry=True)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/ucsc.py:419, in get_genomes(rest_url)
    417     genomes[genome]["annotations"] = []
    418 # add accession IDs (self.assembly_accession will try to fill in the blanks)
--> 419 genomes = add_accessions1(genomes)
    420 genomes = add_accessions2(genomes)
    421 genomes = add_annotation_links(genomes)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/ucsc.py:446, in add_accessions1(genomes)
    443 ret = query_ucsc(command, database)
    445 # convert to dataframe
--> 446 df = pd.DataFrame.from_records(ret)
    447 df.columns = ["name", "accession_name2", "match"]
    448 df.set_index("name", inplace=True)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/pandas/core/frame.py:2181, in DataFrame.from_records(cls, data, index, exclude, columns, coerce_float, nrows)
   2178     return cls()
   2180 try:
-> 2181     first_row = next(data)
   2182 except StopIteration:
   2183     return cls(index=index, columns=columns)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/genomepy/providers/ucsc.py:515, in query_ucsc(command, database)
    510 def query_ucsc(command: str, database: str = None) -> Generator:
    511     """
    512     Execute a single MySQL query on the UCSC database.
    513     Streams the output into a generator.
    514     """
--> 515     cnx = mysql.connector.connect(
    516         host="genome-mysql.soe.ucsc.edu",
    517         user="genome",
    518         port=3306,
    519         database=database,
    520     )
    521     try:
    522         cur = cnx.cursor(buffered=False, raw=False)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/mysql/connector/pooling.py:286, in connect(*args, **kwargs)
    283         raise ImportError(ERROR_NO_CEXT)
    285 if CMySQLConnection and not use_pure:
--> 286     return CMySQLConnection(*args, **kwargs)
    287 return MySQLConnection(*args, **kwargs)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/mysql/connector/connection_cext.py:101, in CMySQLConnection.__init__(self, **kwargs)
     98 super().__init__()
    100 if kwargs:
--> 101     self.connect(**kwargs)

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/mysql/connector/abstracts.py:1095, in MySQLConnectionAbstract.connect(self, **kwargs)
   1092     self.config(**kwargs)
   1094 self.disconnect()
-> 1095 self._open_connection()
   1096 # Server does not allow to run any other statement different from ALTER
   1097 # when user's password has been expired.
   1098 if not self._client_flags & ClientFlag.CAN_HANDLE_EXPIRED_PASSWORDS:

File /gpfs/gsfs6/users/kellyrc/conda/envs/celloracle_env/lib/python3.8/site-packages/mysql/connector/connection_cext.py:268, in CMySQLConnection._open_connection(self)
    266         self.converter.str_fallback = self._converter_str_fallback
    267 except MySQLInterfaceError as err:
--> 268     raise get_mysql_exception(
    269         msg=err.msg, errno=err.errno, sqlstate=err.sqlstate
    270     ) from err
    272 self._do_handshake()

DatabaseError: 2005 (HY000): Unknown MySQL server host 'genome-mysql.soe.ucsc.edu' (-2)

Daniel Schmelter

unread,
Aug 1, 2022, 6:49:51 PM8/1/22
to Ryan Kelly, gen...@soe.ucsc.edu

Hello Ryan,

Thank you for contacting the UCSC Genome Browser support team.

Unfortunately, we cannot help you with genomepy, since that is an external tool and we only support the UCSC Genome Browser site directly. As a guess, I would say that you could try updating your tool version, trying again, or downloading our data directly. The code appears to match our SQL host address, user, and port information.

We have not experienced any connection issues on our end, and our tools appear to be working as expected. There are three ways we offer downloads: data files, our API, or our SQL server. Depending on your desired format, you can download the mm10 reference genome in any of those three ways. Here is the link to our downloads page for mouse, my preferred method of data gathering:

https://hgdownload.soe.ucsc.edu/downloads.html#mouse

Furthermore, here is the documentation for the API and SQL tools:

https://genome.ucsc.edu/goldenPath/help/api.html
https://genome.ucsc.edu/goldenPath/help/mysql.html

I hope this was helpful and wish you the best! If you have any more questions, please reply-all to our public support email at gen...@soe.ucsc.edu. For private communication, please reply-all to genom...@soe.ucsc.edu.


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAGWT_os7n_kobaV%2Bay3q3pQLstrm5YAjEoyT4dnYysrpCvUbPQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages