running error with function “reduceDimension”

chao lu

unread,

May 31, 2021, 7:43:51 AM5/31/21

to cicero-users

Hi Hannah,

I've recently run Cicero for Monocle/Monocle 2 to analyze my 10×genomics scATAC-seq data. When I was running “reduceDimension” function, an error encountered, however, saying as follows：

“Error in if (any(i < 0L)) { : 需要TRUE/FALSE值的地方不可以用缺少值

此外: Warning message:

In int2i(as.integer(i), n) : NAs introduced by coercion to integer range”

The R package of cicero on my computer is built upon monocle2, where R version is 4.0.3.

I have screenshot the error message as well and attached it to

the Email. To help troubleshoot my problem, the R code is also provided as follows：

————————————————————————————————————————————————

library(cicero)

setwd("F:/results")

peak_annotation_file <- "F:/20210307/monocle2/input_JGI/peak_annotation.tsv"

# direct loading of the original format

df_peakanno <- readr::read_tsv(peak_annotation_file)

# read in matrix data using the Matrix package

indata <- Matrix::readMM("F:/20210307/monocle2/input_JGI/matrix.mtx")

# binarize the matrix

indata@x[indata@x > 0] <- 1

cellinfo <- read.csv("F:/20210307/monocle2/input_JGI/LibraryID.csv", header = TRUE)

row.names(cellinfo) <- cellinfo$barcode

peakinfo <- read.table("F:/20210307/monocle2/input_JGI/peaks.bed")

names(peakinfo) <- c("chr", "bp1", "bp2")

peakinfo$gene_short_name <- df_peakanno$gene

peakinfo$site_name <- paste(peakinfo$chr, peakinfo$bp1, peakinfo$bp2, sep="_")

row.names(peakinfo) <- peakinfo$site_name

row.names(indata) <- row.names(peakinfo)

colnames(indata) <- row.names(cellinfo)

fd <- methods::new("AnnotatedDataFrame", data = peakinfo)

pd <- methods::new("AnnotatedDataFrame", data = cellinfo)

input_cds <- suppressWarnings(newCellDataSet(indata,

phenoData = pd,

featureData = fd,

expressionFamily=VGAM::binomialff(),

lowerDetectionLimit=0))

input_cds@expressionFamily@vfamily <- "binomialff"

input_cds <- monocle::detectGenes(input_cds)

#Ensure there are no peaks included with zero reads

input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,]

set.seed(2017)

input_cds <- detectGenes(input_cds)

input_cds <- estimateSizeFactors(input_cds)

input_cds <- reduceDimension(input_cds, max_components = 2, num_dim=6,

reduction_method = 'tSNE', norm_method = "none")

reduceDimension error.png

hpl...@gmail.com

unread,

Jun 3, 2021, 9:33:26 AM6/3/21

to cicero-users

Hello,

Can you provide some more information about input_cds? For example the output of input_cds, head(pData(input_cds)), head(fData(input_cds)) and head(exprs(input_cds)).

Could you also translate the error message for me? :)

Best,

Hannah

chao lu

unread,

Jun 10, 2021, 9:02:08 AM6/10/21

to cicero-users

Hi Hannah,

All related information is provided in screenshot pictures, please check in the email attachment. As you mentioned in the letter, there are some errors shown in the Chinese language, thus I have changed the language settings on my laptop and now all related error information has switched to English. The error of invoking the "reduceDimension" function is "Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed

In addition: Warning message:
In int2i(as.integer(i), n) : NAs introduced by coercion to integer range". You can also check it in the screenshot pics in the attachments named "reduceDimension_error.png". In addition, I have also uploaded the input files for cicero analysis to expedite the troubleshooting process, including "matrix.mtx"(peak matrix data), "peaks.bed"(peak info), "LibraryID.csv"(cell barcode info), and "peak_annotation.tsv"(peak annotation info). These input files are compressed into one file named "input_dataset.rar" and shared with google drive (https://drive.google.com/file/d/1J3UjAdehPVFYGV0Y1gx0ZYqbS32ZNd15/view?usp=sharing). Another thing I need to mention is that my scATAC-seq data comes from the 10x Genomics platform. To help you better understand my problem, operating system information, R version, and related R packages information is provided as follows:

--------------------------------------------------------------------------------------------------
R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

other attached packages:
[1] cicero_1.9.1 Gviz_1.34.1 GenomicRanges_1.42.0
[4] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[7] monocle_2.18.0 DDRTree_0.1.5 irlba_2.3.3
[10] VGAM_1.1-5 ggplot2_3.3.3 Biobase_2.50.0

[13] BiocGenerics_0.36.1 Matrix_1.3-3

----------------------------------------------------------------------------------------------------

Lastly, since the above problems come from cicero for monocle/monolce2. I also tried to install cicero for monocle3 on my laptop under windows10. I set up an R environment using the Conda command and activate the R environment to install cicero for monocle3 packages. Following the tutorial instruction, I tried to install monocle3 packages since cicero is dependent on monocle3. However, an error always pops up saying "ERROR: compilation failed for package leidenbase". I notice there is an Installation troubleshooting section in the monocle3 tutorial and there is indeed a solution to troubleshoot "ERROR: compilation failed for package 'leidenbase'". Unfortunately, it says "The above error indicates that you need to install gfortran on your computer (for Mac users only). ", which is not my case as my operating system is windows10. I have googled for solutions, yet failed to find the proper answer. The error message was attached to the email named "leidenbase error.png". The code to install the monocle3 is also shown as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")

BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment', 'batchelor', 'Matrix.utils'))

install.packages("devtools")
devtools::install_github('cole-trapnell-lab/leidenbase') #error appeared saying " compilation failed for package leidenbase "

devtools::install_github('cole-trapnell-lab/monocle3')

inputcds_info.png

reduceDimension_error.png

head(exprs(input_cds))_screenshot.png

leidenbase error.png

hpl...@gmail.com

unread,

Jun 12, 2021, 7:41:44 AM6/12/21

to cicero-users

Thanks for the information. First on the installation, it looks like you don't have a g++ compiler installed on your computer. You can try installing with these instructions: https://www3.cs.stonybrook.edu/~alee/g++/g++.html If that doesn't work, go ahead and open an issue on the leidenbase github page https://github.com/cole-trapnell-lab/leidenbase with the details of your installation.

I took a look at your script, and it looks like you're hitting the size limits that are inherent in monocle/monocle2. I was successfully able to generate a reduced dimension with a subset of cells and features, so I think you're going to have to upgrade to monocle3 in order to use monocle for data of this size.

Hope this helps,

Hannah

chao lu

unread,

Jun 24, 2021, 1:32:27 PM6/24/21

to cicero-users

Hi Hannah,

I am trying to subset my dataset using the "choose_cells" function. However, a new problem emerged: "Error in utils::browseURL(appUrl): 'browser' must be a non-empty character string". I have googled for solutions yet failed to find any feasible ways to work out my problem. To expedite the troubleshooting process, the R script is provided as follows.

--------------------------------------------------------------------------------------------------------------------------------------------------------

rm(list = ls())

setwd("/home/luchao/scATAC/cicero_input/JGI/merge_123/results")

library(cicero)

library(tidyverse)

library(Matrix)

peak_annotation_file <- '/home/luchao/scATAC/cicero_input/JGI/merge_123/peak_annotation.tsv'

df_peakanno <- readr::read_tsv(peak_annotation_file)

sparseMatrix_peakanno <- readr::read_tsv(peak_annotation_file) %>%

dplyr::mutate(peak = factor(peak, levels = peak)) %>%

tidyr::separate_rows(gene, distance, peak_type, sep = ';') %>%

dplyr::filter(!is.na(gene)) %>%

dplyr::mutate(gene = factor(gene)) %>%

dplyr::group_by(peak, gene) %>%

dplyr::summarise(value = as.integer(n() > 0)) %>%

stats::xtabs(value ~ peak + gene, data = ., sparse = T)

indata <- Matrix::readMM("/home/luchao/scATAC/cicero_input/JGI/merge_123/matrix.mtx")

indata@x[indata@x > 0] <- 1

cellinfo <- read.table("/home/luchao/scATAC/cicero_input/JGI/merge_123/Graph_Based.csv", header = TRUE)

row.names(cellinfo) <- cellinfo$barcode

peakinfo <- read.table("/home/luchao/scATAC/cicero_input/JGI/merge_123/peaks.bed")

names(peakinfo) <- c("chr", "bp1", "bp2")

peakinfo$gene_short_name <- df_peakanno$gene

peakinfo$site_name <- paste(peakinfo$chr, peakinfo$bp1, peakinfo$bp2, sep="_")

row.names(peakinfo) <- peakinfo$site_name

row.names(indata) <- row.names(peakinfo)

colnames(indata) <- row.names(cellinfo)

input_cds <- suppressWarnings(new_cell_data_set(indata,cell_metadata = cellinfo,gene_metadata = peakinfo))

input_cds <- monocle3::detect_genes(input_cds)

input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,]

set.seed(2017)

input_cds <- detect_genes(input_cds)

input_cds <- estimate_size_factors(input_cds)

input_cds <- preprocess_cds(input_cds, method = "LSI")

input_cds <- reduce_dimension(input_cds, reduction_method = 'UMAP',

preprocess_method = "LSI")

plot_cells(input_cds)

umap_coords <- reducedDims(input_cds)$UMAP

cicero_cds <- make_cicero_cds(input_cds, reduced_coordinates = umap_coords)

input_cds <- cluster_cells(input_cds)

input_cds <- learn_graph(input_cds)

cds <- input_cds

get_earliest_principal_node <- function(cds, time_bin="NEC"){

cell_ids <- which(colData(cds)[, "timepoint"] == time_bin)

closest_vertex <-

cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex

closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])

root_pr_nodes <-

igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names

(which.max(table(closest_vertex[cell_ids,]))))]

root_pr_nodes

}

cds<- order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))

plot_cells(cds,

color_cells_by = "pseudotime",

label_cell_groups=FALSE,

label_leaves=TRUE,

label_branch_points=TRUE,

graph_label_size=2.0)

plot_cells(cds, color_cells_by = "timepoint", group_label_size=4, graph_label_size=1.5)

pdf("/home/luchao/scATAC/cicero_input/JGI/merge_123/results/pseudotime.pdf",width=7, height=6)

plot_cells(cds, color_cells_by = "pseudotime", group_label_size=4, graph_label_size=1.5)

dev.off()

pdf("/home/luchao/scATAC/cicero_input/JGI/merge_123/results/timepoint.pdf",width=7, height=6)

plot_cells(cds, color_cells_by = "timepoint", group_label_size=4, graph_label_size=1.5)

dev.off()

cds_subset <- choose_cells(cds) #error emerged:Error in utils::browseURL(appUrl): 'browser' must be a non-empty character string

chao lu

unread,

Jun 24, 2021, 1:36:28 PM6/24/21

to cicero-users

Hi Hannah,

The system information and related package information is also provided as follows.

cicero for monocle3

R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

hpl...@gmail.com

unread,

Aug 2, 2021, 10:32:37 AM8/2/21

to cicero-users

Hello, Apologies for the delay. Are you running your script on a cluster? The choose_cells function unfortunately does not work if you're running the script on a compute cluster or from within a python notebook. If you download the data locally and run it in R or R studio, it should work.

chao lu

unread,

Aug 7, 2021, 6:17:41 AM8/7/21

to hpl...@gmail.com, cicero-users

Hi Hannah，

The R script is running on my laptop (Dell), not a cluster and it was not within a python notebook as well. It works well on windows10 while error occurs on ubuntu saying "Error in utils::browseURL(appUrl): 'browser' must be a non-empty character string"

hpl...@gmail.com <hpl...@gmail.com> 于2021年8月2日周一下午10:32写道：

--
You received this message because you are subscribed to a topic in the Google Groups "cicero-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cicero-users/jXqe4cdZD54/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cicero-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cicero-users/dfa98387-ee95-4231-bd8d-ecd1d788df81n%40googlegroups.com.

Hannah A Pliner

unread,

Aug 24, 2021, 1:04:36 PM8/24/21

to chao lu, hpl...@gmail.com, cicero-users

Hello,

Can you provide the output of traceback() after the error shows up? Sorry for summer slowness, I'll try to get back to you quicker on this round.

Best,

Hannah

You received this message because you are subscribed to the Google Groups "cicero-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cicero-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cicero-users/CAF0BJK4JpJHSmp%3DYNu%3Dypugvn_S%3DstriLWu-utvNyqhQi3Ov2w%40mail.gmail.com.

Reply all

Reply to author

Forward