I was trying out Garnett on a dataset I had analysed with Monocle3, and it is not working. This is what I am running - and it all goes smoothly until I try to train the classifier:
cds <- load_cellranger_data("path/to/data")
cds <- cds[!Matrix::rowSums(exprs(cds)) == 0,] #I added this after seeing that there were ~9000 rows with only 0 in my data.
counts(cds) <- as(counts(cds),"dgCMatrix")
cds <- estimate_size_factors(cds)
#Sanity checks - all show "0"
sum(!is.finite(exprs(cds)))
sum(Matrix::rowSums(exprs(cds)) == 0)
sum(Matrix::colSums(exprs(cds)) == 0)
#Process
cds = preprocess_cds(cds, num_dim = 100)
cds = reduce_dimension(cds, cores = 10)
cds = reduce_dimension(cds, reduction_method="tSNE")
cds = cluster_cells(cds, resolution=c(10^seq(-6,-1)))
colData(cds)$garnett_cluster = clusters(cds)
marker_check <- check_markers(cds, "markers/marker_file.txt",
db=org.Mm.eg.db,
cds_gene_id_type = "ENSEMBL",
marker_file_gene_id_type = "ENSEMBL")
plot_markers(marker_check)
cds_classifier <- train_cell_classifier(cds = cds,
marker_file = "markers/marker_file.txt",
db=org.Mm.eg.db::org.Mm.eg.db,
cds_gene_id_type = "ENSEMBL",
num_unknown = 500,
marker_file_gene_id_type = "ENSEMBL",
cores=50)
There are 10 cell type definitions
Error in `$<-.data.frame`(`*tmp*`, "cds", value = character(0)) :
replacement has 0 rows, data has 82
Not sure what the problem is, specially as the "marker_check" goes well.
> cds
class: cell_data_set
dim: 18845 10375
metadata(1): cds_version
assays(1): counts
rownames(18845): ENSMUSG00000051951 ENSMUSG00000025900 ...
ENSMUSG00000063897 ENSMUSG00000095742
rowData names(3): id gene_short_name num_cells_expressed
colnames(10375): AAACCTGAGACTAGGC-1 AAACCTGAGAGTAATC-1 ...
TTTGTCATCTATCCTA-1 TTTGTCATCTCTTATG-1
colData names(5): barcode Size_Factor UMI num_genes_expressed
garnett_cluster
reducedDimNames(3): PCA UMAP tSNE
spikeNames(0):
other attached packages:
[1] org.Mm.eg.db_3.7.0 AnnotationDbi_1.44.0
[3] garnett_0.2.2 monocle3_0.1.1
[5] SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0
[7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[9] Biobase_2.42.0 DelayedArray_0.8.0
[11] BiocParallel_1.16.6 IRanges_2.16.0
[13] S4Vectors_0.20.1 BiocGenerics_0.28.0
[15] matrixStats_0.54.0 cowplot_1.0.0
[17] ggplot2_3.2.0 dplyr_0.8.3
[19] Seurat_3.0.2