Ambiguous dimensionality error while training classifier

david.sanin

unread,

Jul 26, 2019, 11:56:19 AM7/26/19

to garnett-users

Hey,

I was trying out Garnett on a dataset I had analysed with Monocle3, and it is not working. This is what I am running - and it all goes smoothly until I try to train the classifier:

cds <- load_cellranger_data("path/to/data")

cds <- cds[!Matrix::rowSums(exprs(cds)) == 0,] #I added this after seeing that there were ~9000 rows with only 0 in my data.

counts(cds) <- as(counts(cds),"dgCMatrix")

cds <- estimate_size_factors(cds)

#Sanity checks - all show "0"

sum(is.na(exprs(cds)))

sum(!is.finite(exprs(cds)))

sum(Matrix::rowSums(exprs(cds)) == 0)

sum(Matrix::colSums(exprs(cds)) == 0)

#Process

cds = preprocess_cds(cds, num_dim = 100)

cds = reduce_dimension(cds, cores = 10)

cds = reduce_dimension(cds, reduction_method="tSNE")

cds = cluster_cells(cds, resolution=c(10^seq(-6,-1)))

colData(cds)$garnett_cluster = clusters(cds)

marker_check <- check_markers(cds, "markers/marker_file.txt",

db=org.Mm.eg.db,

cds_gene_id_type = "ENSEMBL",

marker_file_gene_id_type = "ENSEMBL")

plot_markers(marker_check)

cds_classifier <- train_cell_classifier(cds = cds,

marker_file = "markers/marker_file.txt",

db=org.Mm.eg.db::org.Mm.eg.db,

cds_gene_id_type = "ENSEMBL",

num_unknown = 500,

marker_file_gene_id_type = "ENSEMBL",

cores=50)

The error I get is this:

There are 10 cell type definitions

Error in `$<-.data.frame`(`*tmp*`, "cds", value = character(0)) :

replacement has 0 rows, data has 82

Not sure what the problem is, specially as the "marker_check" goes well.

Thanks in advance!!

some more info:

> cds

class: cell_data_set

dim: 18845 10375

metadata(1): cds_version

assays(1): counts

rownames(18845): ENSMUSG00000051951 ENSMUSG00000025900 ...

ENSMUSG00000063897 ENSMUSG00000095742

rowData names(3): id gene_short_name num_cells_expressed

colnames(10375): AAACCTGAGACTAGGC-1 AAACCTGAGAGTAATC-1 ...

TTTGTCATCTATCCTA-1 TTTGTCATCTCTTATG-1

colData names(5): barcode Size_Factor UMI num_genes_expressed

garnett_cluster

reducedDimNames(3): PCA UMAP tSNE

spikeNames(0):

other attached packages:

[1] org.Mm.eg.db_3.7.0 AnnotationDbi_1.44.0

[3] garnett_0.2.2 monocle3_0.1.1

[5] SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0

[7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2

[9] Biobase_2.42.0 DelayedArray_0.8.0

[11] BiocParallel_1.16.6 IRanges_2.16.0

[13] S4Vectors_0.20.1 BiocGenerics_0.28.0

[15] matrixStats_0.54.0 cowplot_1.0.0

[17] ggplot2_3.2.0 dplyr_0.8.3

[19] Seurat_3.0.2

Hannah Pliner

unread,

Jul 26, 2019, 11:04:54 PM7/26/19

to garnett-users

That is strange, especially if marker_check is working. Can you run it again and then post the output of traceback() ? That'll help me see where the error is occurring. And can you confirm that the markers in your file are actually in ENSEMBL format? (Most people use SYMBOL so I wanted to double check!)

Best,

Hannah

david.sanin

unread,

Jul 29, 2019, 8:08:07 AM7/29/19

to garnett-users

Hi Hannah,

Thanks for your quick reply.

The genes in my marker file are definitely ENSEMBL, I shared the output of plot_markers in case that helps (below).

These is the output of traceback:

5: stop(sprintf(ngettext(N, "replacement has %d row, data has %d",

"replacement has %d rows, data has %d"), N, nrows), domain = NA)

4: `$<-.data.frame`(`*tmp*`, "cds", value = character(0))

3: `$<-`(`*tmp*`, "cds", value = character(0))

2: make_name_map(parse_list, as.character(row.names(rowData(norm_cds))),

classifier_gene_id_type, marker_file_gene_id_type, db)

1: train_cell_classifier(cds = cds, marker_file = "markers/marker_file.txt",

db = org.Mm.eg.db::org.Mm.eg.db, cds_gene_id_type = "ENSEMBL",

num_unknown = 500, marker_file_gene_id_type = "ENSEMBL",

cores = 50)

Thanks again for your help.

David

Hannah A Pliner

unread,

Jul 30, 2019, 1:56:45 PM7/30/19

to david.sanin, garnett-users

Hi David,

Good catch. There was a bug in the name mapping when marker files were ENSEMBL ids. I just pushed a fix to the monocle3 branch, go ahead and reinstall and hopefully that'll solve it!

Best,

Hannah

Hannah Pliner, Ph.D.

Lead Data Scientist for Single Cell Genomics

Brotman Baty Institute for Precision Medicine

Health Sciences Building (HSB) H564E

Seattle, WA

206-616-0454

hpl...@uw.edu

--
You received this message because you are subscribed to the Google Groups "garnett-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to garnett-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/garnett-users/a0863c7e-2b69-48fb-b2e1-714c37a03b6e%40googlegroups.com.

david.sanin

unread,

Jul 31, 2019, 8:35:19 AM7/31/19

to garnett-users

That fixed the problem. Now it runs smoothly to the end!

Thanks for your help!

David

To unsubscribe from this group and stop receiving emails from it, send an email to garnet...@googlegroups.com.

Reply all

Reply to author

Forward