Hi Jordi,
It's tough to give you exact recommendations without knowing a bit more context, but a couple of thoughts:
- If by 'not working so well' you mean that new cell types are being classified as something they shouldn't be (rather than unknown), which sometimes happens, then as a very first pass, I recommend increasing the num_unknown parameter (maybe to 1000 depending on the size of your dataset) to make sure that there is enough representation to prevent overfitting.
- Another thing to check in this case is that you don't accidentally have a promiscuous marker in your marker file that's actually expressed widely. Check this using the marker checking functionality, or just by plotting the expression of the markers using plot_cells from monocle3 and check that expression is restricted
If you know you're going to be testing on a dataset with a lot of new cell types, and you have access to that data, then it would definitely be a good idea to train on an integrated dataset.
Hope this helps,
Hannah