Ale Garcia
unread,Mar 16, 2026, 5:06:42 PM (2 days ago) Mar 16Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to nltk-users
Hi NLTK community,
I'm building a market intelligence platform that classifies search query intent (informational, navigational, transactional, research, pre-purchase) and detects emerging market signals from Google Ads data.
Key challenges:
Intent classification on short/ambiguous queries (≤3 tokens, 30% of data)
Limited labeled data (targeting 10k examples, F1 ≥ 0.75)
Real-time processing requirements (p50 ≤ 200ms)
Current approach:
Distilled sentence transformers + MLP classifier
Hybrid rule-based fallback for explicit signals
Active learning loop for low-confidence predictions
Questions for the community:
Best practices for handling short query ambiguity with limited labels?
Recommended weak supervision techniques for bootstrapping intent classifiers?
Evaluation strategies beyond standard F1 for market detection quality?
NLTK tools that complement modern transformer approaches for this use case?
Happy to share more details about the system architecture and data schema. Thanks in advance for any insights!