I’ve asked Aaron Lun on the singlecell-queries slack. He was looking for data sets for his data package.
Jenny
--
You received this message because you are subscribed to the Google Groups "bioconductor-teaching" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
bioconductor-tea...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/bioconductor-teaching/8bfedb37-1348-4e1c-a9f3-307ec3325dd7n%40googlegroups.com.
Aaron pointed me to a promising data set: scRNAseq::BachMammaryData(). 25K cells total from 4 mammary developmental stages, 2 replicates each! I made an attempt at reading in the data, getting logNormCounts, pulling 13 genes I got from skimming the paper (https://www.nature.com/articles/s41467-017-02001-5), transposing the normalized values to cells X genes and then adding on colData. This has all 25,806 cell barcodes from Cell Ranger; the paper clearly lays out the extra cell filtering thresholds, but they only ended up filtering out ~800 cells so I did not bother to try it. I also did not attempt any plotting at all to see if these genes were suitable – time for me to go to bed.
Knitting takes about ~5 min for me, mostly due to the normalization but downloading the data the first time could add on more. I’m fairly old-school with base R coding so feel free to update to “tidyr” ways. I did put it in a tibble at the end! The final .csv and .rds are < 1500 KB, an easy size for downloading
I pushed everything to https://github.com/Bioconductor/bioconductor-teaching/tree/master/data/BachMammaryData
Goodnight,
Jenny
To view this discussion on the web, visit https://groups.google.com/d/msgid/bioconductor-teaching/DM5PR11MB00278923359F29CC57B3D57BB29F9%40DM5PR11MB0027.namprd11.prod.outlook.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/bioconductor-teaching/DM5PR11MB00274C37A0A8698464F7DA18B29E9%40DM5PR11MB0027.namprd11.prod.outlook.com.
Good, I am glad that this data set fits our requirements. I just found that one sample from the data set has it’s own chapter in OSCA: http://bioconductor.org/books/release/OSCA/bach-mouse-mammary-gland-10x-genomics.html. I’m not sure it’s necessary that we do any QC filtering that I skipped before, but this has codes to calculate MT percentage and discard cells with high MT.
Jenny
To view this discussion on the web, visit https://groups.google.com/d/msgid/bioconductor-teaching/CA%2BuNOzg920KUCQB9%2BuBE-wecZ7m%2BL317n-6dsNz6yX%3Dcd9ynSQ%40mail.gmail.com.