A handfull of questions

86 views
Skip to first unread message

Robert Alpin

unread,
Oct 2, 2019, 7:14:38 PM10/2/19
to garnett-users
Hello,
I've been playing around with Garnett with mixed success to identify cells in a dataset. I'm left with a few questions as to what the best practices are.

Most marker files I've seen only use a very small number of genes for 'expressed'. However, I'm trying to differentiate two similar subtypes of cell from a greater population, and using the bare minimum of markers hasn't proven very usefull for telling the two apart. This has lead me to add more and more marker genes to try and tell them apart. Is it counterproductive to use so many genes, or do more genes help build a more accurate picture?

Also with subtypes, when I set a cell as a subtype of another cell, it adds all the subtype markers to the main type markers in check_markers. Does this mean that it is using the subtype markers to search for the maintype?

Does Garnett use the main type markers to search for the subtype? For example, if I specify that atrial is a subtype of CM, does it determine which cells are CMs first then decide which ones are atrial? Does it search for atrial as the list of markers for CM + the markers for atrial?

How do I add a key to plot_cells? I have a lot of colors on this plot, but I don't understand what colors are which cell type. Also, can I change the assigned colors? One cell group in particular has been assigned a shade of pinkish purple that's too close to Unknown's purplish pink.

Thanks,
Robert Alpin

Hannah Pliner

unread,
Oct 7, 2019, 7:44:51 AM10/7/19
to garnett-users
Hello,

I'll respond in order:

Usually a handful of 'good' markers is superior to a lot of mediocre ones. 'Good' in this context mostly means specific. The place where I find lots of markers to be useful is when there are a lot of very lowly expressed but highly specific markers in a cell type - because of high dropout then, using only a few markers won't capture enough training cells. I tend to start with a few markers, and if there aren't a decent number of training cells chosen, then I increase my numbers.

Yes, by default, Garnett assumes that if a gene is expressed in the subtype, it it also expressed in the supertype. You can turn off this behavior by setting propogate_markers = FALSE

When there are multiple levels (i.e. subtypes), Garnett will train models in order - first with the highest level, then with successive subtypes. For subtypes, it will run a lenient classification (i.e. lower evidence threshold for calling a cell a member of the group) for the parent type, and then use those cells to train the subtype model. So in your example, if will first train a model to find CMs and the other top types, then will leniently classify cells as CMs, and then use those putative CMs for training a model for atrial. In the successive classification, it will not use the parent (CM) markers.

To add a key to plot_cells, use label_cell_groups = FALSE. This tells the plot not to label the clusters themselves on the plot, but to instead use a key. And for changing the colors, you can use any of the standard ggplot color methods, i.e. with RColorBrewer package: plot_cells(cds, label_cell_groups=FALSE) + scale_color_brewer(palette="Dark2"). There's a nice post about the options here: http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually

Best,
Hannah
Reply all
Reply to author
Forward
0 new messages