error in munge_sumstats.py

660 views

Skip to first unread message

studen...@hotmail.com

unread,

Jul 16, 2018, 1:52:35 PM7/16/18

to ldsc_users

When I use this pipeline to test the example (schizophrenia and bipolar disorder), always have this error:

here is my log file:

> --sumstats pgc.cross.SCZ17.2013-05.txt \

> --N 17115 \

> --out scz \

> --merge-alleles w_hm3.snplist

*********************************************************************

* LD Score Regression (LDSC)

* Version 1.0.0

* Broad Institute of MIT and Harvard / MIT Department of Mathematics

* GNU General Public License v3

*********************************************************************

Call:

./munge_sumstats.py \

--out scz \

--merge-alleles w_hm3.snplist \

--N 17115.0 \

--sumstats pgc.cross.SCZ17.2013-05.txt

Interpreting column names as follows:

info: INFO score (imputation quality; higher --> better imputation)

snpid: Variant ID (e.g., rs number)

a1: Allele 1, interpreted as ref allele for signed sumstat.

pval: p-Value

a2: Allele 2, interpreted as non-ref allele for signed sumstat.

or: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)

Reading list of SNPs for allele merge from w_hm3.snplist

Read 1217311 SNPs for allele merge.

Reading sumstats from pgc.cross.SCZ17.2013-05.txt into memory 5000000 SNPs at a time.

. done

Read 1237958 SNPs from --sumstats file.

Removed 137131 SNPs not in --merge-alleles.

Removed 0 SNPs with missing values.

Removed 256286 SNPs with INFO <= 0.9.

Removed 0 SNPs with MAF <= 0.01.

Removed 0 SNPs with out-of-bounds p-values.

Removed 2 variants that were not SNPs or were strand-ambiguous.

844539 SNPs remain.

Removed 0 SNPs with duplicated rs numbers (844539 SNPs remain).

Using N = 17115.0

Median value of or was 1.0, which seems sensible.

Removed 39 SNPs whose alleles did not match --merge-alleles (844500 SNPs remain).

ERROR converting summary statistics:

Traceback (most recent call last):

File "../munge_sumstats.py", line 707, in munge_sumstats

dat = allele_merge(dat, merge_alleles, log)

File "../munge_sumstats.py", line 445, in allele_merge

dat.loc[~jj, [i for i in dat.columns if i != 'SNP']] = float('nan')

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 193, in __setitem__

indexer = self._get_setitem_indexer(key)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 171, in _get_setitem_indexer

return self._convert_tuple(key, is_setter=True)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 242, in _convert_tuple

idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer

.format(mask=objarr[mask]))

KeyError: '[-1 -1 -2 ... -1 -1 -1] not in index'

Conversion finished at Sat Jul 14 19:40:33 2018

Total time elapsed: 1.0m:58.13s

Traceback (most recent call last):

File "../munge_sumstats.py", line 746, in <module>

munge_sumstats(parser.parse_args(), p=True)

File "../munge_sumstats.py", line 707, in munge_sumstats

dat = allele_merge(dat, merge_alleles, log)

File "../munge_sumstats.py", line 445, in allele_merge

dat.loc[~jj, [i for i in dat.columns if i != 'SNP']] = float('nan')

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 193, in __setitem__

indexer = self._get_setitem_indexer(key)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 171, in _get_setitem_indexer

return self._convert_tuple(key, is_setter=True)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 242, in _convert_tuple

idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)

File "/home/ys/software/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer

.format(mask=objarr[mask]))

KeyError: '[-1 -1 -2 ... -1 -1 -1] not in index'

If i remove --merge-alleles, it will be ok. But it's not right when i use sumstats to calculate the genetic correlation.

Raymond Walters

unread,

Jul 16, 2018, 2:05:29 PM7/16/18

to studen...@hotmail.com, ldsc_users

Hi,

This error is caused by an incompatible version of pandas. We currently recommend using the provided conda environment as described in the github readme. Otherwise, you'll need to modify your python environment to use a version of pandas before 0.21.

Cheers,

Raymond

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/b277c790-0532-40a8-b6bb-288b8ae850c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages