running error with function “reduceDimension”

424 views
Skip to first unread message

chao lu

unread,
May 31, 2021, 7:43:51 AM5/31/21
to cicero-users
Hi Hannah,
I've recently run Cicero for Monocle/Monocle 2 to analyze my 10×genomics scATAC-seq data. When I was running “reduceDimension” function,  an error encountered, however,  saying as follows:
Error in if (any(i < 0L)) { : 需要TRUE/FALSE值的地方不可以用缺少值
此外: Warning message:
In int2i(as.integer(i), n) : NAs introduced by coercion to integer range
The R package of cicero on my computer is built upon monocle2, where R version is 4.0.3.
I have screenshot the error message as well and attached it to
the Email. To help troubleshoot my problem, the R code is also provided as follows:
————————————————————————————————————————————————
library(cicero)
setwd("F:/results")
peak_annotation_file <- "F:/20210307/monocle2/input_JGI/peak_annotation.tsv"
# direct loading of the original format
df_peakanno <- readr::read_tsv(peak_annotation_file)

# read in matrix data using the Matrix package
indata <- Matrix::readMM("F:/20210307/monocle2/input_JGI/matrix.mtx") 
# binarize the matrix
indata@x[indata@x > 0] <- 1

cellinfo <- read.csv("F:/20210307/monocle2/input_JGI/LibraryID.csv", header = TRUE)
row.names(cellinfo) <- cellinfo$barcode
peakinfo <- read.table("F:/20210307/monocle2/input_JGI/peaks.bed")
names(peakinfo) <- c("chr", "bp1", "bp2")
peakinfo$gene_short_name <- df_peakanno$gene
peakinfo$site_name <- paste(peakinfo$chr, peakinfo$bp1, peakinfo$bp2, sep="_")
row.names(peakinfo) <- peakinfo$site_name
row.names(indata) <- row.names(peakinfo)
colnames(indata) <- row.names(cellinfo)

fd <- methods::new("AnnotatedDataFrame", data = peakinfo)
pd <- methods::new("AnnotatedDataFrame", data = cellinfo)
input_cds <-  suppressWarnings(newCellDataSet(indata,
                            phenoData = pd,
                            featureData = fd,
                            expressionFamily=VGAM::binomialff(),
                            lowerDetectionLimit=0))
input_cds@expressionFamily@vfamily <- "binomialff"
input_cds <- monocle::detectGenes(input_cds)

#Ensure there are no peaks included with zero reads
input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,] 

set.seed(2017)
input_cds <- detectGenes(input_cds)
input_cds <- estimateSizeFactors(input_cds)

input_cds <- reduceDimension(input_cds, max_components = 2, num_dim=6,
                      reduction_method = 'tSNE', norm_method = "none")
reduceDimension error.png

hpl...@gmail.com

unread,
Jun 3, 2021, 9:33:26 AM6/3/21
to cicero-users
Hello,

Can you provide some more information about input_cds? For example the output of input_cds, head(pData(input_cds)), head(fData(input_cds)) and head(exprs(input_cds)).
Could you also translate the error message for me? :) 

Best,
Hannah

chao lu

unread,
Jun 10, 2021, 9:02:08 AM6/10/21
to cicero-users
Hi Hannah,
All related information is provided in screenshot pictures, please check in the email attachment. As you mentioned in the letter, there are some errors shown in the Chinese language, thus I have changed the language settings on my laptop and now all related error information has switched to English. The error of invoking the "reduceDimension" function is "Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In int2i(as.integer(i), n) : NAs introduced by coercion to integer range"
. You can also check it in the screenshot pics in the attachments named "reduceDimension_error.png". In addition, I have also uploaded the input files for cicero analysis to expedite the troubleshooting process, including "matrix.mtx"(peak matrix data), "peaks.bed"(peak info), "LibraryID.csv"(cell barcode info), and "peak_annotation.tsv"(peak annotation info). These input files are compressed into one file named "input_dataset.rar" and shared with google drive (https://drive.google.com/file/d/1J3UjAdehPVFYGV0Y1gx0ZYqbS32ZNd15/view?usp=sharing). Another thing I need to mention is that my scATAC-seq data comes from the 10x Genomics platform. To help you better understand my problem, operating system information, R version, and related R packages information is provided as follows:
--------------------------------------------------------------------------------------------------
R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

other attached packages:
 [1] cicero_1.9.1         Gviz_1.34.1          GenomicRanges_1.42.0
 [4] GenomeInfoDb_1.26.7  IRanges_2.24.1       S4Vectors_0.28.1    
 [7] monocle_2.18.0       DDRTree_0.1.5        irlba_2.3.3        
[10] VGAM_1.1-5           ggplot2_3.3.3        Biobase_2.50.0      
[13] BiocGenerics_0.36.1  Matrix_1.3-3  
----------------------------------------------------------------------------------------------------
Lastly, since the above problems come from cicero for monocle/monolce2. I also tried to install cicero for monocle3 on my laptop under windows10. I set up an R environment using the Conda command and activate the R environment to install cicero for monocle3 packages. Following the tutorial instruction, I tried to install monocle3 packages since cicero is dependent on monocle3. However, an error always pops up saying "ERROR: compilation failed for package leidenbase". I notice there is an Installation troubleshooting section in the monocle3 tutorial and there is indeed a solution to troubleshoot "ERROR: compilation failed for package 'leidenbase'". Unfortunately, it says "The above error indicates that you need to install gfortran on your computer (for Mac users only). ", which is not my case as my operating system is windows10. I have googled for solutions, yet failed to find the proper answer. The error message was attached to the email named "leidenbase error.png". The code to install the monocle3 is also shown as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")

BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
                       'limma', 'S4Vectors', 'SingleCellExperiment',
                       'SummarizedExperiment', 'batchelor', 'Matrix.utils'))

install.packages("devtools")
devtools::install_github('cole-trapnell-lab/leidenbase') #error appeared saying " compilation failed for package leidenbase "
devtools::install_github('cole-trapnell-lab/monocle3')
inputcds_info.png
reduceDimension_error.png
head(exprs(input_cds))_screenshot.png
leidenbase error.png

hpl...@gmail.com

unread,
Jun 12, 2021, 7:41:44 AM6/12/21
to cicero-users
Thanks for the information. First on the installation, it looks like you don't have a g++ compiler installed on your computer. You can try installing with these instructions: https://www3.cs.stonybrook.edu/~alee/g++/g++.html If that doesn't work, go ahead and open an issue on the leidenbase github page https://github.com/cole-trapnell-lab/leidenbase with the details of your installation.

I took a look at your script, and it looks like you're hitting the size limits that are inherent in monocle/monocle2. I was successfully able to generate a reduced dimension with a subset of cells and features, so I think you're going to have to upgrade to monocle3 in order to use monocle for data of this size.

Hope this helps,
Hannah

chao lu

unread,
Jun 24, 2021, 1:32:27 PM6/24/21
to cicero-users
Hi Hannah,
I am trying to subset my dataset using the "choose_cells" function. However, a new problem emerged: "Error in utils::browseURL(appUrl):  'browser' must be a non-empty character string". I have googled for solutions yet failed to find any feasible ways to work out my problem. To expedite the troubleshooting process, the R script is provided as follows.

--------------------------------------------------------------------------------------------------------------------------------------------------------
rm(list = ls())
setwd("/home/luchao/scATAC/cicero_input/JGI/merge_123/results")
library(cicero)
library(tidyverse)
library(Matrix)

peak_annotation_file <- '/home/luchao/scATAC/cicero_input/JGI/merge_123/peak_annotation.tsv'
df_peakanno <- readr::read_tsv(peak_annotation_file)

sparseMatrix_peakanno <- readr::read_tsv(peak_annotation_file) %>%
  dplyr::mutate(peak = factor(peak, levels = peak)) %>%
  tidyr::separate_rows(gene, distance, peak_type, sep = ';') %>%
  dplyr::filter(!is.na(gene)) %>%
dplyr::mutate(gene = factor(gene)) %>%
   dplyr::group_by(peak, gene) %>%
   dplyr::summarise(value = as.integer(n() > 0)) %>%
   stats::xtabs(value ~ peak + gene, data = ., sparse = T)

indata <- Matrix::readMM("/home/luchao/scATAC/cicero_input/JGI/merge_123/matrix.mtx") 
indata@x[indata@x > 0] <- 1
cellinfo <- read.table("/home/luchao/scATAC/cicero_input/JGI/merge_123/Graph_Based.csv", header = TRUE)
row.names(cellinfo) <- cellinfo$barcode
peakinfo <- read.table("/home/luchao/scATAC/cicero_input/JGI/merge_123/peaks.bed")
names(peakinfo) <- c("chr", "bp1", "bp2")
peakinfo$gene_short_name <- df_peakanno$gene
peakinfo$site_name <- paste(peakinfo$chr, peakinfo$bp1, peakinfo$bp2, sep="_")
row.names(peakinfo) <- peakinfo$site_name
row.names(indata) <- row.names(peakinfo)
colnames(indata) <- row.names(cellinfo)
   
input_cds <-  suppressWarnings(new_cell_data_set(indata,cell_metadata = cellinfo,gene_metadata = peakinfo))
input_cds <- monocle3::detect_genes(input_cds)
input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,]    
set.seed(2017)
input_cds <- detect_genes(input_cds)
input_cds <- estimate_size_factors(input_cds)
input_cds <- preprocess_cds(input_cds, method = "LSI")
input_cds <- reduce_dimension(input_cds, reduction_method = 'UMAP', 
                              preprocess_method = "LSI")
   
plot_cells(input_cds)
umap_coords <- reducedDims(input_cds)$UMAP
cicero_cds <- make_cicero_cds(input_cds, reduced_coordinates = umap_coords)

input_cds <- cluster_cells(input_cds)
input_cds <- learn_graph(input_cds)

cds <- input_cds
get_earliest_principal_node <- function(cds, time_bin="NEC"){
  cell_ids <- which(colData(cds)[, "timepoint"] == time_bin)
  
  closest_vertex <-
  cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
  closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
  root_pr_nodes <-
  igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
  (which.max(table(closest_vertex[cell_ids,]))))]
  
  root_pr_nodes
}
cds<- order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))   
plot_cells(cds,
           color_cells_by = "pseudotime",
           label_cell_groups=FALSE,
           label_leaves=TRUE,
           label_branch_points=TRUE,
           graph_label_size=2.0)

plot_cells(cds, color_cells_by = "timepoint", group_label_size=4, graph_label_size=1.5)

pdf("/home/luchao/scATAC/cicero_input/JGI/merge_123/results/pseudotime.pdf",width=7, height=6)
plot_cells(cds, color_cells_by = "pseudotime", group_label_size=4, graph_label_size=1.5)
dev.off()
 
pdf("/home/luchao/scATAC/cicero_input/JGI/merge_123/results/timepoint.pdf",width=7, height=6)
plot_cells(cds, color_cells_by = "timepoint", group_label_size=4, graph_label_size=1.5)
dev.off()

cds_subset <- choose_cells(cds)                 #error emerged:Error in utils::browseURL(appUrl):  'browser' must be a non-empty character string

chao lu

unread,
Jun 24, 2021, 1:36:28 PM6/24/21
to cicero-users
Hi Hannah,
The system information and related package information is also provided as follows.
cicero for monocle3
R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

hpl...@gmail.com

unread,
Aug 2, 2021, 10:32:37 AM8/2/21
to cicero-users
Hello, Apologies for the delay. Are you running your script on a cluster? The choose_cells function unfortunately does not work if you're running the script on a compute cluster or from within a python notebook. If you download the data locally and run it in  R or R studio, it should work.

chao lu

unread,
Aug 7, 2021, 6:17:41 AM8/7/21
to hpl...@gmail.com, cicero-users
Hi Hannah,
The R script is running on my laptop (Dell), not a cluster and it was not within a python notebook as well. It works well on windows10 while error occurs on ubuntu saying "Error in utils::browseURL(appUrl):  'browser' must be a non-empty character string"

hpl...@gmail.com <hpl...@gmail.com> 于2021年8月2日周一 下午10:32写道:
--
You received this message because you are subscribed to a topic in the Google Groups "cicero-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cicero-users/jXqe4cdZD54/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cicero-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cicero-users/dfa98387-ee95-4231-bd8d-ecd1d788df81n%40googlegroups.com.

Hannah A Pliner

unread,
Aug 24, 2021, 1:04:36 PM8/24/21
to chao lu, hpl...@gmail.com, cicero-users

Hello,

Can you provide the output of traceback() after the error shows up? Sorry for summer slowness, I'll try to get back to you quicker on this round.

Best,
Hannah


You received this message because you are subscribed to the Google Groups "cicero-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cicero-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cicero-users/CAF0BJK4JpJHSmp%3DYNu%3Dypugvn_S%3DstriLWu-utvNyqhQi3Ov2w%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages