IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

1,325 views
Skip to first unread message

Leon Hubbard

unread,
Aug 29, 2018, 10:50:31 AM8/29/18
to ldsc_users
Hi, I am experiencing a strange error running LDSC (IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match), which occurs when running the partitioned heritability function. I have started from scratch but still get this error no matter what. The error is at the bottom but included the code and output below if this is of help! This error happens regardless of what ref-ld weights I give it (ie baseline V2, any of the QTL weights etc) so isn't specific to a particular annotation. Is there anything obvious I am doing to cause this and do you have any suggestions?

Many thanks, 
Leon

The server OS info:
lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.9 (Final)
Release: 6.9
Codename: Final

My conda version:
conda --version
conda 4.3.14

conda env create --file environment.yml

Using Anaconda API: https://api.anaconda.org
Fetching package metadata .............
Solving package specifications: .
bedtools-2.27. 100% |##############################################################################################################################| Time: 0:00:03 218.40 kB/s
nose-1.3.7-py2 100% |##############################################################################################################################| Time: 0:00:00   9.08 MB/s
numpy-1.12.1-p 100% |##############################################################################################################################| Time: 0:00:00  46.84 MB/s
scipy-0.18.1-n 100% |##############################################################################################################################| Time: 0:00:00  54.83 MB/s
pybedtools-0.7 100% |##############################################################################################################################| Time: 0:00:06   2.12 MB/s

export PATH=/share/apps/anaconda2/bin:$PATH
source activate ldsc

GENERATE sumstats,gz
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./munge_sumstats.py \
--out GWAS \
--merge-alleles w_hm3.snplist \
--N 417508.0 \
--sumstats GWAS.assoc 

Interpreting column names as follows:
A1: Allele 1, interpreted as ref allele for signed sumstat.
P: p-Value
BETA: [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)
A2: Allele 2, interpreted as non-ref allele for signed sumstat.
SNP: Variant ID (e.g., rs number)

Reading list of SNPs for allele merge from w_hm3.snplist
Read 1217311 SNPs for allele merge.
Reading sumstats from GWAS.assoc into memory 5000000 SNPs at a time.
.. done
Read 6761246 SNPs from --sumstats file.
Removed 5603635 SNPs not in --merge-alleles.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= 0.9.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 7 variants that were not SNPs or were strand-ambiguous.
1157604 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (1157604 SNPs remain).
Using N = 417508.0
Median value of BETA was 0.0, which seems sensible.
Removed 26 SNPs whose alleles did not match --merge-alleles (1157578 SNPs remain).
Writing summary statistics for 1217311 SNPs (1157578 with nonmissing beta) to GWAS.sumstats.gz.

Metadata:
Mean chi^2 = 1.148
Lambda GC = 1.078
Max chi^2 = 334.385

### RUN PARTITIONED H2 ON BASELINE V1.1
./ldsc.py --h2 GWAS.sumstats.gz --w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. --ref-ld-chr baselineLD_v1.1/baselineLD. --overlap-annot --out GWAS --frqfile-chr 1000G_Phase3_frq/1000G.EUR.QC. --print-coefficients

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--h2 GWAS.sumstats.gz \
--ref-ld-chr baselineLD_v1.1/baselineLD. \
--out GWAS \
--overlap-annot  \
--frqfile-chr 1000G_Phase3_frq/1000G.EUR.QC. \
--w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC. \
--print-coefficients  

Beginning analysis at Wed Aug 29 15:25:00 2018
Reading summary statistics from GWAS.sumstats.gz ...
Read summary statistics for 1157578 SNPs.
Reading reference panel LD Score from baselineLD_v1.1/baselineLD.[1-22] ...
Read reference panel LD Scores for 1190321 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from 1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.[1-22] ...
Read regression weight LD Scores for 1187349 SNPs.
After merging with reference panel LD, 1145731 SNPs remain.
After merging with regression SNP LD, 1142898 SNPs remain.
Removed 0 SNPs with chi^2 > 417.508 (1142898 SNPs remain)
Total Observed scale h2: 0.0147 (0.0034)
Categories: baseL2_0 Coding_UCSCL2_0 Coding_UCSC.extend.500L2_0 Conserved_LindbladTohL2_0 Conserved_LindbladToh.extend.500L2_0 CTCF_HoffmanL2_0 CTCF_Hoffman.extend.500L2_0 DGF_ENCODEL2_0 DGF_ENCODE.extend.500L2_0 DHS_peaks_TrynkaL2_0 DHS_TrynkaL2_0 DHS_Trynka.extend.500L2_0 Enhancer_AnderssonL2_0 Enhancer_Andersson.extend.500L2_0 Enhancer_HoffmanL2_0 Enhancer_Hoffman.extend.500L2_0 FetalDHS_TrynkaL2_0 FetalDHS_Trynka.extend.500L2_0 H3K27ac_HniszL2_0 H3K27ac_Hnisz.extend.500L2_0 H3K27ac_PGC2L2_0 H3K27ac_PGC2.extend.500L2_0 H3K4me1_peaks_TrynkaL2_0 H3K4me1_TrynkaL2_0 H3K4me1_Trynka.extend.500L2_0 H3K4me3_peaks_TrynkaL2_0 H3K4me3_TrynkaL2_0 H3K4me3_Trynka.extend.500L2_0 H3K9ac_peaks_TrynkaL2_0 H3K9ac_TrynkaL2_0 H3K9ac_Trynka.extend.500L2_0 Intron_UCSCL2_0 Intron_UCSC.extend.500L2_0 PromoterFlanking_HoffmanL2_0 PromoterFlanking_Hoffman.extend.500L2_0 Promoter_UCSCL2_0 Promoter_UCSC.extend.500L2_0 Repressed_HoffmanL2_0 Repressed_Hoffman.extend.500L2_0 SuperEnhancer_HniszL2_0 SuperEnhancer_Hnisz.extend.500L2_0 TFBS_ENCODEL2_0 TFBS_ENCODE.extend.500L2_0 Transcr_HoffmanL2_0 Transcr_Hoffman.extend.500L2_0 TSS_HoffmanL2_0 TSS_Hoffman.extend.500L2_0 UTR_3_UCSCL2_0 UTR_3_UCSC.extend.500L2_0 UTR_5_UCSCL2_0 UTR_5_UCSC.extend.500L2_0 WeakEnhancer_HoffmanL2_0 WeakEnhancer_Hoffman.extend.500L2_0 Super_Enhancer_VahediL2_0 Super_Enhancer_Vahedi.extend.500L2_0 Typical_Enhancer_VahediL2_0 Typical_Enhancer_Vahedi.extend.500L2_0 GERP.NSL2_0 GERP.RSsup4L2_0 MAFbin1L2_0 MAFbin2L2_0 MAFbin3L2_0 MAFbin4L2_0 MAFbin5L2_0 MAFbin6L2_0 MAFbin7L2_0 MAFbin8L2_0 MAFbin9L2_0 MAFbin10L2_0 MAF_Adj_Predicted_Allele_AgeL2_0 MAF_Adj_LLD_AFRL2_0 Recomb_Rate_10kbL2_0 Nucleotide_Diversity_10kbL2_0 Backgrd_Selection_StatL2_0 CpG_Content_50kbL2_0
Lambda GC: 1.0772
Mean Chi^2: 1.1427
Intercept: 0.9922 (0.0112)
Ratio < 0 (usually indicates GC correction).
Reading annot matrix from baselineLD_v1.1/baselineLD.[1-22] ...
ldsc/ldscore/parse.py:120: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df_annot = df_annot[(.95 > df_frq.FRQ) & (df_frq.FRQ > 0.05)]
Error parsing .annot file.
Traceback (most recent call last):
  File "./ldsc.py", line 644, in <module>
    sumstats.estimate_h2(args, log)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 369, in estimate_h2
    overlap_matrix, M_tot = _read_annot(args, log)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 96, in _read_annot
    'annot matrix', ps.annot, frqfile=args.frqfile_chr)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 152, in _read_chr_split_files
    out = parsefunc(_splitp(chr_arg), _N_CHR, **kwargs)
  File "/home/c1002680/LDSC/ldsc/ldscore/parse.py", line 197, in annot
    for i, fh in enumerate(fh_list)]
  File "/home/c1002680/LDSC/ldsc/ldscore/parse.py", line 120, in annot_parser
    df_annot = df_annot[(.95 > df_frq.FRQ) & (df_frq.FRQ > 0.05)]
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/frame.py", line 1998, in _getitem_array
    key = check_bool_indexer(self.index, key)
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1939, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Analysis finished at Wed Aug 29 15:26:24 2018
Total time elapsed: 1.0m:23.24s
Traceback (most recent call last):
  File "./ldsc.py", line 644, in <module>
    sumstats.estimate_h2(args, log)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 369, in estimate_h2
    overlap_matrix, M_tot = _read_annot(args, log)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 96, in _read_annot
    'annot matrix', ps.annot, frqfile=args.frqfile_chr)
  File "/home/c1002680/LDSC/ldsc/ldscore/sumstats.py", line 152, in _read_chr_split_files
    out = parsefunc(_splitp(chr_arg), _N_CHR, **kwargs)
  File "/home/c1002680/LDSC/ldsc/ldscore/parse.py", line 197, in annot
    for i, fh in enumerate(fh_list)]
  File "/home/c1002680/LDSC/ldsc/ldscore/parse.py", line 120, in annot_parser
    df_annot = df_annot[(.95 > df_frq.FRQ) & (df_frq.FRQ > 0.05)]
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/frame.py", line 1998, in _getitem_array
    key = check_bool_indexer(self.index, key)
  File "/home/c1002680/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1939, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Raymond Walters

unread,
Aug 30, 2018, 5:44:10 PM8/30/18
to Leon Hubbard, ldsc_users
Hi Leon,
That is a strange error, don't think I've seen that one before. From the error trace, looks like an issue with lining up the allele frequencies (from --frqfile-chr) with the annotations. Wondering if your download of those frequency files may have gotten corrupted? Would be useful to compare the frqfile files to the weights to see if there's an issue with matching the SNP names, or just try re-downloading a fresh copy of the frequency files.
Cheers,
Raymond



--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/95cb1569-d2bd-485a-a128-e66a5c01acde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages