Ambiguous dimensionality error while training classifier

56 views
Skip to first unread message

david.sanin

unread,
Jul 26, 2019, 11:56:19 AM7/26/19
to garnett-users
Hey, 

I was trying out Garnett on a dataset I had analysed with Monocle3, and it is not working. This is what I am running - and it all goes smoothly until I try to train the classifier:

cds <- load_cellranger_data("path/to/data")
cds <- cds[!Matrix::rowSums(exprs(cds)) == 0,] #I added this after seeing that there were ~9000 rows with only 0 in my data.
counts(cds) <- as(counts(cds),"dgCMatrix")
cds <- estimate_size_factors(cds)

#Sanity checks - all show "0"
sum(is.na(exprs(cds)))
sum(!is.finite(exprs(cds)))
sum(Matrix::rowSums(exprs(cds)) == 0)
sum(Matrix::colSums(exprs(cds)) == 0)

#Process
cds = preprocess_cds(cds, num_dim = 100)
cds = reduce_dimension(cds, cores = 10)
cds = reduce_dimension(cds, reduction_method="tSNE")
cds = cluster_cells(cds, resolution=c(10^seq(-6,-1)))
colData(cds)$garnett_cluster = clusters(cds)
marker_check <- check_markers(cds, "markers/marker_file.txt",
                              db=org.Mm.eg.db,
                              cds_gene_id_type = "ENSEMBL",
                              marker_file_gene_id_type = "ENSEMBL")
plot_markers(marker_check)
cds_classifier <- train_cell_classifier(cds = cds,
                                         marker_file = "markers/marker_file.txt",
                                         db=org.Mm.eg.db::org.Mm.eg.db,
                                         cds_gene_id_type = "ENSEMBL",
                                         num_unknown = 500,
                                         marker_file_gene_id_type = "ENSEMBL",
                                         cores=50)

The error I get is this:

There are 10 cell type definitions
Error in `$<-.data.frame`(`*tmp*`, "cds", value = character(0)) :
  replacement has 0 rows, data has 82

Not sure what the problem is, specially as the "marker_check" goes well.

Thanks in advance!!

some more info:
> cds
class: cell_data_set
dim: 18845 10375
metadata(1): cds_version
assays(1): counts
rownames(18845): ENSMUSG00000051951 ENSMUSG00000025900 ...
  ENSMUSG00000063897 ENSMUSG00000095742
rowData names(3): id gene_short_name num_cells_expressed
colnames(10375): AAACCTGAGACTAGGC-1 AAACCTGAGAGTAATC-1 ...
  TTTGTCATCTATCCTA-1 TTTGTCATCTCTTATG-1
colData names(5): barcode Size_Factor UMI num_genes_expressed
  garnett_cluster
reducedDimNames(3): PCA UMAP tSNE
spikeNames(0):

other attached packages:
 [1] org.Mm.eg.db_3.7.0          AnnotationDbi_1.44.0
 [3] garnett_0.2.2               monocle3_0.1.1
 [5] SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0
 [7] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2
 [9] Biobase_2.42.0              DelayedArray_0.8.0
[11] BiocParallel_1.16.6         IRanges_2.16.0
[13] S4Vectors_0.20.1            BiocGenerics_0.28.0
[15] matrixStats_0.54.0          cowplot_1.0.0
[17] ggplot2_3.2.0               dplyr_0.8.3
[19] Seurat_3.0.2

Hannah Pliner

unread,
Jul 26, 2019, 11:04:54 PM7/26/19
to garnett-users
That is strange, especially if marker_check is working. Can you run it again and then post the output of traceback() ? That'll help me see where the error is occurring. And can you confirm that the markers in your file are actually in ENSEMBL format? (Most people use SYMBOL so I wanted to double check!)

Best,
Hannah

david.sanin

unread,
Jul 29, 2019, 8:08:07 AM7/29/19
to garnett-users
Hi Hannah, 

Thanks for your quick reply.

The genes in my marker file are definitely ENSEMBL, I shared the output of plot_markers in case that helps (below).

These is the output of traceback:

5: stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
       "replacement has %d rows, data has %d"), N, nrows), domain = NA)
4: `$<-.data.frame`(`*tmp*`, "cds", value = character(0))
3: `$<-`(`*tmp*`, "cds", value = character(0))
2: make_name_map(parse_list, as.character(row.names(rowData(norm_cds))),
       classifier_gene_id_type, marker_file_gene_id_type, db)
1: train_cell_classifier(cds = cds, marker_file = "markers/marker_file.txt",
       db = org.Mm.eg.db::org.Mm.eg.db, cds_gene_id_type = "ENSEMBL",
       num_unknown = 500, marker_file_gene_id_type = "ENSEMBL",
       cores = 50)

Thanks again for your help.

David


Marker_Check.png

Hannah A Pliner

unread,
Jul 30, 2019, 1:56:45 PM7/30/19
to david.sanin, garnett-users
Hi David,

Good catch. There was a bug in the name mapping when marker files were ENSEMBL ids. I just pushed a fix to the monocle3 branch, go ahead and reinstall and hopefully that'll solve it!

Best,
Hannah


Hannah Pliner, Ph.D.
Lead Data Scientist for Single Cell Genomics
Brotman Baty Institute for Precision Medicine
Health Sciences Building (HSB) H564E
Seattle, WA


--
You received this message because you are subscribed to the Google Groups "garnett-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to garnett-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/garnett-users/a0863c7e-2b69-48fb-b2e1-714c37a03b6e%40googlegroups.com.

david.sanin

unread,
Jul 31, 2019, 8:35:19 AM7/31/19
to garnett-users
That fixed the problem. Now it runs smoothly to the end!

Thanks for your help!

David
To unsubscribe from this group and stop receiving emails from it, send an email to garnet...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages