Many cell definitions dropped

37 views
Skip to first unread message

S Hui

unread,
Oct 17, 2019, 11:08:19 AM10/17/19
to garnett-users

Hi, 
I'm new to Garnett and had a few questions after trying to run it on my own data set which is 33694 genes x 2059 cells.

My marker file which I have formatted to the required Garnett format contains 20 cell definitions (after removing several that were overlapped or not found in db).  
When I run train_classifier, half of the cell types are dropped because there are not enough training cells.

1) Is there anything that I should do so that I can retain more of these cell types.  The cells in my data set have been manually labelled previously using the cell type definitions so I am wondering why so many were dropped.

2) I read in a previous post that the same input data set used for the training step can be used for the classifying step and I wanted to confirm this is appropriate (normally training and classifying data sets should be separate?).

Thanks for your help,
shui

Hannah Pliner

unread,
Oct 25, 2019, 7:46:55 AM10/25/19
to garnett-users
Hi Shui,

In general Garnett will drop a lot of cell definitions when the markers aren't specific enough and so most cells are labelled as ambiguous. There is a new parameter in train_cell_classifier: return_initial_assign, which will return a data frame of the initial assignments of your cells - this, along with the marker plots, may help you identify the genes that are problematic.

The reason that you can use the same data for training and classifying in this scenario is that we aren't attempting to prove the accuracy of the model like we would usually be doing in machine learning - instead Garnett is just designed to classify your cells!

Hope this helps,
Hannah

Robert Alpin

unread,
Oct 25, 2019, 3:29:19 PM10/25/19
to garnett-users
I was also confused about the train/classify aspect of Garnett. It feels to me like the tutorial implies that the same dataset should be used to train and classify, especially in the line "If you haven't loaded your data yet (because you're using a pre-trained classifier), now is the time!"
Reply all
Reply to author
Forward
0 new messages