Testing different tissues with a training dataset

67 views

Skip to first unread message

Jordi Camps

unread,

Sep 16, 2020, 4:19:42 AM9/16/20

to garnett-users

Hi Garnett team,

I've been trying for a while to train a garnett classifier for tumor microenvironment (like described in the garnett article).

Because I wanted some more resolution I added some extra cell types and the annotation works well on the trained dataset.

But on test datasets it's not working so well and I'm wondering if the missing cell types are the main problem, unbiased lung tumor has different cell types than unbiased brain tumor etc.

Do you have any recommendations for how I could handle this? Making an integrated training dataset perhaps that contains all cell types so the unknown type is trained better?

Thanks in advance,

Jordi

hpl...@gmail.com

unread,

Sep 20, 2020, 7:32:28 AM9/20/20

to garnett-users

Hi Jordi,

It's tough to give you exact recommendations without knowing a bit more context, but a couple of thoughts:

- If by 'not working so well' you mean that new cell types are being classified as something they shouldn't be (rather than unknown), which sometimes happens, then as a very first pass, I recommend increasing the num_unknown parameter (maybe to 1000 depending on the size of your dataset) to make sure that there is enough representation to prevent overfitting.

- Another thing to check in this case is that you don't accidentally have a promiscuous marker in your marker file that's actually expressed widely. Check this using the marker checking functionality, or just by plotting the expression of the markers using plot_cells from monocle3 and check that expression is restricted

If you know you're going to be testing on a dataset with a lot of new cell types, and you have access to that data, then it would definitely be a good idea to train on an integrated dataset.

One more recommendation - if you have an annotated dataset already, consider trying out the new marker-free functionality (https://cole-trapnell-lab.github.io/garnett/docs_m3/#1c-train-a-marker-free-classifier) which may train a better classifier, especially if you're trying to get something high resolution where the markers aren't that specific.

Hope this helps,

Hannah

Reply all

Reply to author

Forward

0 new messages