Cicero reduceDimension() returns "Error: all rows have standard deviation zero"

74 views
Skip to first unread message

mattre...@gmail.com

unread,
Jan 2, 2020, 4:08:33 PM1/2/20
to cicero-users
To whom it may concern,

I am following the Cicero tutorial found here (https://cole-trapnell-lab.github.io/cicero-release/docs/) to work with my own 10x scATAC-seq data. Has anyone received this error? It seems to be pretty consistent regardless of changing the reduction_method or num_dim parameters. Any thoughts/feedback/ideas would be greatly appreciated. Thanks in advance! 

input_cds_0 <- reduceDimension(input_cds_0, max_components = 2, num_dim=6,
+                                reduction_method = 'tSNE', norm_method = "none",verbose = T)
Error in reduceDimension(input_cds_0, max_components = 2, num_dim = 6,  : 
  Error: all rows have standard deviation zero


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices utils     datasets 
[10] methods   base     

other attached packages:
 [1] cicero_1.0.15                     Gviz_1.26.5                      
 [3] monocle_2.10.1                    DDRTree_0.1.5                    
 [5] irlba_2.3.3                       VGAM_1.1-1                       
 [7] Matrix_1.2-15                     chromVAR_1.4.1                   
 [9] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.50.0                  
[11] rtracklayer_1.42.2                TFBSTools_1.20.0                 
[13] JASPAR2018_1.1.1                  ggplot2_3.2.1                    
[15] EnsDb.Hsapiens.v86_2.99.0         ensembldb_2.6.8                  
[17] AnnotationFilter_1.6.0            GenomicFeatures_1.34.8           
[19] AnnotationDbi_1.44.0              Biobase_2.42.0                   
[21] Seurat_3.1.2                      Signac_0.1.6                     
[23] WhopGenome_0.9.7                  Rsamtools_1.34.1                 
[25] Biostrings_2.50.2                 XVector_0.22.0                   
[27] GenomicRanges_1.34.0              GenomeInfoDb_1.18.2              
[29] IRanges_2.16.0                    S4Vectors_0.20.1                 
[31] BiocGenerics_0.28.0              

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1              GGally_1.4.0                R.methodsS3_1.7.1          
  [4] tidyr_1.0.0                 acepack_1.4.1               bit64_0.9-7                
  [7] knitr_1.21                  multcomp_1.4-11             DelayedArray_0.8.0         
 [10] R.utils_2.9.2               data.table_1.12.8           rpart_4.1-13               
 [13] KEGGREST_1.22.0             RCurl_1.95-4.12             metap_1.2                  
 [16] cowplot_1.0.0               TH.data_1.0-10              RSQLite_2.1.1              
 [19] RANN_2.6.1                  combinat_0.0-8              future_1.15.1              
 [22] bit_1.1-14                  mutoss_0.1-12               httpuv_1.5.2               
 [25] SummarizedExperiment_1.12.0 assertthat_0.2.1            DirichletMultinomial_1.24.1
 [28] viridis_0.5.1               xfun_0.5                    hms_0.4.2                  
 [31] promises_1.1.0              fansi_0.4.0                 progress_1.2.0             
 [34] caTools_1.17.1.3            igraph_1.2.4.2              DBI_1.0.0                  
 [37] htmlwidgets_1.5.1           sparsesvd_0.2               reshape_0.8.8              
 [40] purrr_0.3.3                 dplyr_0.8.3                 backports_1.1.5            
 [43] annotate_1.60.1             gbRd_0.4-11                 RcppParallel_4.4.4         
 [46] biomaRt_2.38.0              vctrs_0.2.1                 ROCR_1.0-7                 
 [49] withr_2.1.2                 checkmate_1.9.1             sctransform_0.2.1          
 [52] GenomicAlignments_1.18.1    prettyunits_1.0.2           mnormt_1.5-5               
 [55] cluster_2.0.7-1             ape_5.3                     lazyeval_0.2.2             
 [58] seqLogo_1.48.0              crayon_1.3.4                pkgconfig_2.0.3            
 [61] slam_0.1-47                 nlme_3.1-137                ProtGenerics_1.14.0        
 [64] nnet_7.3-12                 rlang_0.4.2                 globals_0.12.5             
 [67] miniUI_0.1.1.1              lifecycle_0.1.0             sandwich_2.5-1             
 [70] rsvd_1.0.2                  dichromat_2.0-0             matrixStats_0.55.0         
 [73] lmtest_0.9-37               graph_1.60.0                ggseqlogo_0.1              
 [76] zoo_1.8-6                   base64enc_0.1-3             ggridges_0.5.1             
 [79] pheatmap_1.0.12             png_0.1-7                   viridisLite_0.3.0          
 [82] bitops_1.0-6                R.oo_1.23.0                 KernSmooth_2.23-15         
 [85] blob_1.1.1                  stringr_1.4.0               readr_1.3.1                
 [88] CNEr_1.18.1                 scales_1.1.0                memoise_1.1.0              
 [91] magrittr_1.5                plyr_1.8.4                  ica_1.0-2                  
 [94] gplots_3.0.1.1              bibtex_0.4.2                gdata_2.18.0               
 [97] zlibbioc_1.28.0             compiler_3.5.2              HSMMSingleCell_1.2.0       
[100] lsei_1.2-0                  RColorBrewer_1.1-2          plotrix_3.7-7              
[103] fitdistrplus_1.0-14         cli_2.0.0                   listenv_0.8.0              
[106] pbapply_1.4-2               htmlTable_1.13.1            Formula_1.2-3              
[109] MASS_7.3-51.1               tidyselect_0.2.5            stringi_1.4.3              
[112] densityClust_0.3            yaml_2.2.0                  latticeExtra_0.6-28        
[115] ggrepel_0.8.1               VariantAnnotation_1.28.13   tools_3.5.2                
[118] future.apply_1.3.0          rstudioapi_0.9.0            TFMPvalue_0.0.8            
[121] foreign_0.8-71              gridExtra_2.3               Rtsne_0.15                 
[124] digest_0.6.23               BiocManager_1.30.10         shiny_1.4.0                
[127] FNN_1.1.3                   qlcMatrix_0.9.7             motifmatchr_1.4.0          
[130] Rcpp_1.0.3                  SDMTools_1.1-221.2          later_1.0.0                
[133] RcppAnnoy_0.0.14            OrganismDbi_1.24.0          httr_1.4.1                 
[136] ggbio_1.30.0                biovizBase_1.30.1           npsurv_0.4-0               
[139] Rdpack_0.11-1               colorspace_1.4-1            XML_3.98-1.18              
[142] reticulate_1.14             uwot_0.1.5                  RBGL_1.58.2                
[145] sn_1.5-4                    multtest_2.38.0             plotly_4.9.1               
[148] xtable_1.8-4                jsonlite_1.6                poweRlaw_0.70.2            
[151] zeallot_0.1.0               R6_2.4.1                    TFisher_0.2.0              
[154] Hmisc_4.2-0                 mime_0.8                    pillar_1.4.3               
[157] htmltools_0.4.0             DT_0.5                      fastmap_1.0.1              
[160] glue_1.3.1                  BiocParallel_1.16.6         codetools_0.2-16           
[163] tsne_0.1-3                  mvtnorm_1.0-11              lattice_0.20-38            
[166] tibble_2.1.3                numDeriv_2016.8-1.1         curl_4.3                   
[169] leiden_0.3.1                gtools_3.8.1                GO.db_3.7.0                
[172] survival_2.43-3             limma_3.38.3                docopt_0.6.1               
[175] fastICA_1.2-2               munsell_0.5.0               GenomeInfoDbData_1.2.0     
[178] reshape2_1.4.3              gtable_0.3.0  

Hannah Pliner

unread,
Jan 3, 2020, 5:26:15 AM1/3/20
to cicero-users
Hello,

Can you check that you don't have any rows or columns in you expression matrix that are all zeroes?
input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,] 

mattre...@gmail.com

unread,
Jan 3, 2020, 10:06:43 AM1/3/20
to cicero-users
Hannah,

Thanks for the quick response. Yes, I ran this line of code before detectGenes(), estimateSizeFactors(), and reduceDimension(). 

Alternatively, is there a workaround for reduceDimension() if I have a processed Seurat object that already has UMAP coords? I have performed Seurat's scRNA-seq/scATAC-seq integration workflow and I would like to run Cicero on each ATAC cluster with transferred labels from scRNA-seq (this is a UMAP plot based on LSI reduction). If you could point me in the right direction of a tutorial/script/post that shows how to run Cicero after the Seurat workflow, that would be greatly appreciated. 

Below is my current code for extracting the UMAP coords from my processed Seurat object used for making the CDS object. However, I am having trouble calculating the distance parameters regardless of increasing the window size or decreasing the distance constraint: 

library(Signac)
library(Seurat)
library(JASPAR2018)
library(TFBSTools)
library(BSgenome.Hsapiens.UCSC.hg38)
library(chromVAR)
library('EnsDb.Hsapiens.v86')
library(cicero)
library(monocle)
library(SummarizedExperiment)

######### read in processed Seurat object with UMAP coords #############
WHIM11.atac.filtered <- readRDS("WHIM11_atac_filtered_transfer_labels_nCountRNA_regress.rds")

library(Seurat)
WHIM11_0 <- subset(WHIM11.atac.filtered,subset = predicted.id == "0")

WHIM11_0_sce <- as.SingleCellExperiment(WHIM11_0)

umap_coords <- as.data.frame(WHIM11_0_sce@reducedDims[["UMAP"]])

########## Load the original 10x ATAC-seq data and subset for cells that have UMAP coords ##########

# read in matrix data using the Matrix package
indata <- Matrix::readMM("filtered_peak_bc_matrix/matrix.mtx") 
# binarize the matrix
indata@x[indata@x > 0] <- 1

# format cell info
cellinfo <- read.table("filtered_peak_bc_matrix/barcodes.tsv")
row.names(cellinfo) <- cellinfo$V1
names(cellinfo) <- "cells"

# format peak info
peakinfo <- read.table("filtered_peak_bc_matrix/peaks.bed")
names(peakinfo) <- c("chr", "bp1", "bp2")
peakinfo$site_name <- paste(peakinfo$chr, peakinfo$bp1, peakinfo$bp2, sep="_")
row.names(peakinfo) <- peakinfo$site_name

row.names(indata) <- row.names(peakinfo)
colnames(indata) <- row.names(cellinfo)

# make CDS
fd <- methods::new("AnnotatedDataFrame", data = peakinfo)
pd <- methods::new("AnnotatedDataFrame", data = cellinfo)
input_cds <-  suppressWarnings(newCellDataSet(indata,
                                              phenoData = pd,
                                              featureData = fd,
                                              expressionFamily=VGAM::negbinomial.size(),
                                              lowerDetectionLimit=0))
input_cds@expressionFamily@vfamily <- "negbinomial.size"
input_cds <- monocle::detectGenes(input_cds)

#Ensure there are no peaks included with zero reads
input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,] 


#Subset input_cds for cells that have UMAP coords
input_cds_0 <-  input_cds[,rownames(WHIM11.ata...@meta.data[WHIM11.ata...@meta.data$predicted.id == "0",])]

# set.seed(2017)
# input_cds <- detectGenes(input_cds)
# input_cds <- estimateSizeFactors(input_cds)

#Make cicero CellDataSet object 
cicero_cds <- make_cicero_cds(input_cds_0, reduced_coordinates = umap_coords)

#Read in chromosome sizes file 
sample_genome <- read.table("hg38.chrom.sizes")

estimate_distance_parameter(cicero_cds,
                            genomic_coords = sample_genome,
                            window = 500000,
                            sample_num = 100,
                            distance_constraint = 250000,
                            max_elements = 100,
                            s = 0.75)




Hannah Pliner

unread,
Jan 3, 2020, 10:11:47 AM1/3/20
to cicero-users
Looks like you have it right as far as using previously computed UMAP coords - assuming that the umap_coords is in the same order as the cells in the input cds. 

What issue are you having when calculating distance param?
input_cds_0 <-  input_cds[,rownames(WHIM11.atac.f...@meta.data[WHIM11.ata...@meta.data$predicted.id == "0",])]

mattre...@gmail.com

unread,
Jan 3, 2020, 10:15:26 AM1/3/20
to cicero-users
I will check the order of the cells next. Thanks! Here is the error from estimate_distance_parameter(): 
Error in estimate_distance_parameter(cicero_cds, genomic_coords = sample_genome,  : 
  No distance_parameters calculated
In addition: Warning message:
In estimate_distance_parameter(cicero_cds, genomic_coords = sample_genome,  :
  Could not calculate sample_num distance_parameters - see documentation details

mattre...@gmail.com

unread,
Jan 3, 2020, 10:26:32 AM1/3/20
to cicero-users
Also, it may be worth noting my median fragments per cell is only ~6,000. This is from the summary html file as output from Cell Ranger. Could this be an issue? 

Hannah Pliner

unread,
Jan 3, 2020, 11:30:07 AM1/3/20
to cicero-users
That could be the issue. Basically that error shows up when the function can't find sufficient example windows to calculate distance parameters (or any). The example windows have to have more than one pair of sites that are greater than distance_constraint/2 apart - so if there are very view peaks that could be problematic. 

Before concluding that though, check that your sample_genome is correct and that the format matches your peaks (i.e. should be in the chr#_#_# format (the underscores can be swapped for colon or dash))


On Friday, January 3, 2020 at 3:26:32 PM UTC, wrote:
Also, it may be worth noting my median fragments per cell is only ~6,000. This is from the summary html file as output from Cell Ranger. Could this be an issue? 

Matt R

unread,
Jan 3, 2020, 11:47:15 AM1/3/20
to cicero-users
Thanks for the help. I really appreciate it. 

Formatting is the likely cause here. My peaks have the GRCh38 prefix. I added this prefix to the sample_genome chromosome column and run_cicero() appears to be running! 
Reply all
Reply to author
Forward
0 new messages