Andi,
I've been using sqlparse for a while to strip out comments and split queries. These are the only methods I use. Recently, while tinkering with sqlparse.format(), I realized that if I had a really large query (say ~100kb), It took on the order of 30 seconds to finish. After profiling, I realized that most of the time is spent in the various grouping functions in iterating over token lists and membership tests. A few questions:
- Would it be prudent in the future to used a dict of some kind for all the token index computations (get_prev, get_next etc.)
- If I only call group_comments (since that is my only use case), would it still be correct (preliminary tests seem to say yes). I would love your opinion.
If you wish, I can share the profiler stats with you guys. Thanks a ton!
Ishaan