running ldsc on the Broad cluster

667 views
Skip to first unread message

Gerald Quon

unread,
Sep 7, 2017, 4:20:30 PM9/7/17
to ldsc_users
Hi,

I'm trying to run ldsc on the broad cluster. I've run the "git clone https://github.com/bulik/ldsc.git" command, and get the following error (pasted at the bottom) when trying to run ./munge_sumstats.py -h. I already ran "use Python-2.7" and "use Anaconda", and have also tried "use Python-3.4" and "use Anaconda3", but still run into problems, any suggestions? (Should I install pandas myself, or did I just "use" the wrong package?)

Also, for the tutorial at the bottom of this page (https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial) for building on the Finucane et al. model, the instructions say "Start by downloading 1000G.mac5eur.*hm.*.snp, and CNS.*.annot.gz from this directory.", but in that directory (https://data.broadinstitute.org/alkesgroup/LDSCORE/), I can't see those files -- where should I grab them from? Thanks!

gquon@silver:~/gquon/ldsc$ python munge_sumstats.py  -h

/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/site-packages/pytz/__init__.py:29: UserWarning: Module hashlib was already imported from /broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/hashlib.pyc, but /broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib/python2.7/site-packages/hashlib-20081119-py2.7-linux-x86_64.egg is being added to sys.path

  from pkg_resources import resource_stream

Traceback (most recent call last):

  File "munge_sumstats.py", line 13, in <module>

    from ldsc import MASTHEAD, Logger, sec_to_str

  File "/broad/compbio/gquon/ldsc/ldsc.py", line 27, in <module>

    raise ImportError('LDSC requires pandas version >= 0.17.0')

ImportError: LDSC requires pandas version >= 0.17.0

Raymond Walters

unread,
Sep 7, 2017, 4:42:28 PM9/7/17
to Gerald Quon, ldsc_users
Hi Gerald,
“use Anaconda” should be sufficient by itself, and is potentially order-sensitive if you also “use Python-2.7”. Broad’s python dotkits get ugly fast, and seem to interfere with each-other with some regularity (running “which python” can sometimes help in diagnosing how they’re interacting). You definitely don’t want Python-3.4 or Anaconda3 since ldsc doesn’t support python 3. 

In case it help, my standard stack of relevant dotkits is:
use Python-2.7
use Anaconda
use .gsl-1.14
(though there’s enough other things in my bashrc that your mileage may vary)

As far as the reference files, they’re contained in the compressed directories on that download page:
- 1000G.mac5eur.* is in 1000G_Phase1_frq.tgz
- hm*.snp is in hapmap3_snps.tgz
- CNS*annot.gz is in 1000G_Phase1_cell_type_groups.tgz

Cheers,
Raymond







-- 
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/953b2bc4-84a7-492a-8a8f-0e77a3dd210c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gerald Quon

unread,
Sep 7, 2017, 4:51:46 PM9/7/17
to ldsc_users
Hi Raymond,

I tried switching the order to use Anaconda after Python-2.7 and it gives the following error (and also gives the same error even if I don't use Python-2.7):

gquon@silver:~/gquon/ldsc$ ./munge_sumstats.py  -h

Traceback (most recent call last):

  File "./munge_sumstats.py", line 11, in <module>

    from scipy.stats import chi2

  File "/home/unix/gquon/.local/lib/python2.7/site-packages/scipy/stats/__init__.py", line 324, in <module>

    from .stats import *

  File "/home/unix/gquon/.local/lib/python2.7/site-packages/scipy/stats/stats.py", line 242, in <module>

    import scipy.special as special

  File "/home/unix/gquon/.local/lib/python2.7/site-packages/scipy/special/__init__.py", line 531, in <module>

    from ._ufuncs import *

ImportError: /home/unix/gquon/.local/lib/python2.7/site-packages/scipy/special/_ufuncs.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8

gquon@silver:~/gquon/ldsc$ which python

/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/bin/python

Verneri Anttila

unread,
Sep 7, 2017, 4:59:04 PM9/7/17
to Gerald Quon, ldsc_users
I had the same issue (pandas version in Anaconda dotkit being too old; something changed in late spring/early summer). 

use .anaconda-2.3.0-jupyter resolved the issue for me (unuse Python-2.7 and Anaconda)

Best regards,
-Verneri Anttila
---
Post-doctoral Research Fellow, 
Broad Institute & Massachusetts General Hospital / Harvard Medical School
---


To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/3e24aff1-b9b4-417c-84b2-a980a56595be%40googlegroups.com.

Raymond Walters

unread,
Sep 7, 2017, 4:59:55 PM9/7/17
to Gerald Quon, ldsc_users
Hi Gerald,
Do you have other python dependencies set up? That “which python” looks right, but the error appears to be loading scipy from a personal directory (/home/unix/gquon/.local/...) rather than from the Broad anaconda install location:
/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/scipy/__init__.pyc

Cheers,
Raymond

Gerald Quon

unread,
Sep 7, 2017, 5:10:29 PM9/7/17
to ldsc_users
I managed to resolve the issue by moving the .local directory somewhere else, thanks. I did also try use .anaconda-2.3.0-jupyter (without using any other package) and it still gave the same error.

mh...@broadinstitute.org

unread,
Nov 3, 2017, 3:16:49 PM11/3/17
to ldsc_users
FYI for other Broad users: I couldn't clone ldsc using the `Anaconda` (currently .anaconda-2.3.0-jupyter) dotkit, I had to use an older version of anaconda to clone/install ldsc, then change back to the current Anaconda to run:  

-bash:platinum:~ 1002 $ use Anaconda

Prepending: Anaconda (ok)

-bash:platinum:~ 1003 $ which python

/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/bin/python

-bash:platinum:~ 1004 $ git clone https://github.com/bulik/ldsc.git

Initialized empty Git repository in ~/ldsc/.git/

error: error setting certificate verify locations:

  CAfile: /broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/etc/pki/tls/certs/cacert.pem

  CApath: none while accessing https://github.com/bulik/ldsc.git/info/refs


fatal: HTTP request failed

-bash:platinum:~ 1005 $ unuse Anaconda

Dropping: Anaconda (ok)

-bash:platinum:~ 1006 $ which python

/usr/bin/python

-bash:platinum:~ 1007 $ use .anaconda-2.1.0

Prepending: .anaconda-2.1.0 (ok)

-bash:platinum:~ 1008 $ git clone https://github.com/bulik/ldsc.git

Initialized empty Git repository in ~/ldsc/.git/

remote: Counting objects: 7573, done.

remote: Total 7573 (delta 0), reused 0 (delta 0), pack-reused 7573

Receiving objects: 100% (7573/7573), 56.41 MiB | 7.17 MiB/s, done.

Resolving deltas: 100% (2666/2666), done.

-bash:platinum:~ 1009 $ cd ldsc

-bash:platinum:~/ldsc 1010 $ ./ldsc.py -h

Vendor:  Continuum Analytics, Inc.

Package: mkl

Message: trial mode expires in 30 days

Traceback (most recent call last):

  File "./ldsc.py", line 27, in <module>

    raise ImportError('LDSC requires pandas version >= 0.17.0')

ImportError: LDSC requires pandas version >= 0.17.0

-bash:platinum:~/ldsc 1011 $ unuse .anaconda-2.1.0

Dropping: .anaconda-2.1.0 (ok)

-bash:platinum:~/ldsc 1012 $ use Anaconda

Prepending: Anaconda (ok)

-bash:platinum:~/ldsc 1013 $ ./ldsc.py -h

usage: ldsc.py [-h] [--out OUT] [--bfile BFILE] [--l2] [--extract EXTRACT]

               [--keep KEEP] [--ld-wind-snps LD_WIND_SNPS]

Raymond Walters

unread,
Nov 3, 2017, 6:47:29 PM11/3/17
to mh...@broadinstitute.org, ldsc_users
Interesting, and good to know, thanks!

Can I ask which git version you’re using? I’m currently unable to recreate this error with "use Anaconda" and either my normal “use Git-2.0” (version 2.0.5 at /broad/software/free/Linux/redhat_6_x86_64/pkgs/git_2.0.5/bin/git) or the default git (version 1.7.1 at /usr/bin/git).

Cheers,
Raymond



mh...@broadinstitute.org

unread,
Nov 3, 2017, 7:27:48 PM11/3/17
to ldsc_users
That's weird - I'm also just using the default git:

-bash:platinum:~ 1001 $ use Anaconda

Prepending: Anaconda (ok)

-bash:platinum:~ 1002 $ which python

/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/bin/python

-bash:platinum:~ 1003 $ which git

/usr/bin/git

-bash:platinum:~ 1004 $ git --version

git version 1.7.1

-bash:platinum:~ 1005 $ git clone https://github.com/bulik/ldsc.git

Initialized empty Git repository in /home/unix/mhaas/ldsc/.git/

error: error setting certificate verify locations:

  CAfile: /broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/etc/pki/tls/certs/cacert.pem

  CApath: none while accessing https://github.com/bulik/ldsc.git/info/refs


fatal: HTTP request failed



Raymond Walters

unread,
Nov 3, 2017, 7:50:11 PM11/3/17
to mh...@broadinstitute.org, ldsc_users
Ok, managed to recreate this by unloading all other dotkits from my normal environment. 

Solution is to do "use .curl-7.47.1” after loading Anaconda. 

(In retrospect, this was among my default dotkits from solving exactly this issue previously. Broad users: if you want more background on this see conversation from March 2nd in #bits on the Broad Institute slack.)

Cheers,
Raymond



Reply all
Reply to author
Forward
0 new messages