Classification weighted by number of marker genes?

39 views
Skip to first unread message

alex.ta...@gmail.com

unread,
May 2, 2019, 1:03:11 PM5/2/19
to garnett-users
Hi,

I'm trying to classify PBMCs including subpopulations. My marker T cell list is longer than my CD4 and CD8 list by ~10X. I don't get any subsets when using this marker file, but do when I shrink T cell list to an equivalent size.

Can you help me understand if this is correct and provide any guidance on usage? For instance, what's the minimum/maximum marker genes for each cell type?

Thanks,
Alex

Hannah A Pliner

unread,
May 3, 2019, 6:44:02 PM5/3/19
to alex.ta...@gmail.com, garnett-users
Hi Alex,

The number of markers you use isn't directly weighting the classification, however, if you use a large number of markers, you're more likely to confuse the classifier because you're more likely to end up with ambiguously called cells, and therefore not enough training data.

We have found that Garnett works best with fewer, higher quality markers rather than more, lower quality ones (my classifiers generally have no more than 10-20 markers per cell type - and often many fewer). Check out the PBMC marker file on the website for an example https://cole-trapnell-lab.github.io/garnett/classifiers/ . The exception to this rule is that when markers are very lowly expressed (and therefore very likely to drop out) you may need more markers in order to find enough training cells.

Hope this helps,
Hannah




Hannah Pliner, Ph.D.
Lead Data Scientist for Single Cell Genomics
Brotman Baty Institute for Precision Medicine
Health Sciences Building (HSB) H564E
Seattle, WA


--
You received this message because you are subscribed to the Google Groups "garnett-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to garnett-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/garnett-users/53a17f83-eb81-4391-9688-054b94650034%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Hannah Pliner

unread,
May 6, 2019, 7:48:15 PM5/6/19
to alex.ta...@gmail.com, garnett-users
Hi Alex,

Yes, it would be reasonable to use check_markers for that. I would look most closely that the ambiguity score with one caveat: if you have a marker that is widely and non-specifically expressed, you will get lots of high ambiguity scores (and sometime with a highly expressed gene, the "bad" gene itself will not look as ambiguous because its a ratio of cells made ambiguous to total nominated cells), so if that happens, I recommend looking for the marker that nominates the most cells and checking if removing it lowers everyone else's ambiguity scores. 

Best,
Hannah

On Fri, May 3, 2019 at 4:30 PM <alex.ta...@gmail.com> wrote:
Hi Hannah,

Thanks for the reply and explanation. Yes, I noticed your classifier had fewer high quality markers - now I understand the rationale.

I'm also trying to create classifiers for minor cell types starting with a ~20 markers with low expression/confidence. Is it reasonable to use the check_markers function to down select this list to fewer high confidence markers? If so, what's the best metric to sort/gate on? marker_check$marker_score?

Best,
Alex

On Friday, May 3, 2019 at 6:44:02 PM UTC-4, Hannah A Pliner wrote:
Hi Alex,

The number of markers you use isn't directly weighting the classification, however, if you use a large number of markers, you're more likely to confuse the classifier because you're more likely to end up with ambiguously called cells, and therefore not enough training data.

We have found that Garnett works best with fewer, higher quality markers rather than more, lower quality ones (my classifiers generally have no more than 10-20 markers per cell type - and often many fewer). Check out the PBMC marker file on the website for an example https://cole-trapnell-lab.github.io/garnett/classifiers/ . The exception to this rule is that when markers are very lowly expressed (and therefore very likely to drop out) you may need more markers in order to find enough training cells.

Hope this helps,
Hannah




Hannah Pliner, Ph.D.
Lead Data Scientist for Single Cell Genomics
Brotman Baty Institute for Precision Medicine
Health Sciences Building (HSB) H564E
Seattle, WA


On Thu, May 2, 2019 at 10:09 AM <alex.t...@gmail.com> wrote:
Hi,

I'm trying to classify PBMCs including subpopulations. My marker T cell list is longer than my CD4 and CD8 list by ~10X. I don't get any subsets when using this marker file, but do when I shrink T cell list to an equivalent size.

Can you help me understand if this is correct and provide any guidance on usage? For instance, what's the minimum/maximum marker genes for each cell type?

Thanks,
Alex

--
You received this message because you are subscribed to the Google Groups "garnett-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to garnet...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "garnett-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to garnett-user...@googlegroups.com.

alex.ta...@gmail.com

unread,
May 7, 2019, 2:15:30 PM5/7/19
to garnett-users
Thanks Hannah. I'm finding Garnett very valuable and I really appreciate your help with my questions.

Alex


On Monday, May 6, 2019 at 7:48:15 PM UTC-4, Hannah Pliner wrote:
Hi Alex,

Yes, it would be reasonable to use check_markers for that. I would look most closely that the ambiguity score with one caveat: if you have a marker that is widely and non-specifically expressed, you will get lots of high ambiguity scores (and sometime with a highly expressed gene, the "bad" gene itself will not look as ambiguous because its a ratio of cells made ambiguous to total nominated cells), so if that happens, I recommend looking for the marker that nominates the most cells and checking if removing it lowers everyone else's ambiguity scores. 

Best,
Hannah

Reply all
Reply to author
Forward
0 new messages