There's no one-true-way for handling feature extraction. If TF-IDF doesn't suit you, try something else. Maybe changing the logarithm base in IDF (default is base 2). Or ditching logarithm and using square root instead. Or treating "special" words and named entities in a special way, outside of the bag-of-words framework.
One example would be using semantic models instead of tf-idf. Models like LSA (latent semantic analysis) or LDA (latent dirichlet allocation) construct condensed features from entire docs, so that each feature is less dependent on a particular word appearing in the input (London, Delhi).
Finally, check your categorization model. Some supervised algorithms
work better on text than others. SVM is a safe choice, Naive Bayes a good start, simple easy to debug. Unless your documents are super short, the categorization shouldn't depend on a single word so strongly, that's just a red flag.
HTH,
Radim