The general answer is that there is no definitive "best" training examples to use - rather that you should focus on examples that are representative of what you want the model to be good at recognising.
e.g. If you want the model to be good at recognising longer sentences with more filler words, then you need examples of longer sentences with filler words. If you want it to be good at recognising very short sentence fragments, you need examples of short sentences fragments. etc.
If there are training examples that have things in common across multiple labels (e.g. "I want") then it is likely that the model will learn that those words have a low significance.
When you test with something that you expect to not match any of the labels, I would hope that the model will reflect this with a low confidence score. The label that is chosen will be a reflection of something the example has in common with a pattern identified during training. There will be hundreds of features that go into the classifying, so even for an example that intuitively has no match for any label, there will be likely to be some feature scorer that triggers in a way that gives one label a slightly higher score than the others.
Kind regards
D