How are comments supposed to be tokenized?

82 views
Skip to first unread message

Anders Qvist

unread,
Apr 25, 2012, 1:43:43 PM4/25/12
to sqlp...@googlegroups.com
In the current master branch, comments are part of the following statement:

>>> t = list(stack.run("-- a comment\nINSERT INTO foo VALUES (1,2);"))[0]
>>> t.tokens
[<Single '-- a c...' at 0xb7081784>, <DML 'INSERT' at 0xb7081504>, <Whitespace ' ' at 0xb7081a2c>, <Keyword 'INTO' at 0xb7081c5c>, <Whitespace ' ' at 0xb70818ec>, <Name 'foo' at 0xb7081d24>, <Whitespace ' ' at 0xb70817ac>, <Keyword 'VALUES' at 0xb70819dc>, <Whitespace ' ' at 0xb708166c>, <Punctuation '(' at 0xb70819b4>, <Integer '1' at 0xb7081dc4>, <Punctuation ',' at 0xb70811e4>, <Integer '2' at 0xb7081b1c>, <Punctuation ')' at 0xb7081644>, <Punctuation ';' at 0xb70817d4>]

Is this intentional? My intuition was that the comment would be its own statement?
If comments are part of a statement, perhaps .token_first() et al should skip past them as they do whitespace?

pir...@gmail.com

unread,
Apr 26, 2012, 4:13:26 AM4/26/12
to sqlp...@googlegroups.com
I think you are wrongly concepts: what you have there is not a list of
statements, but a list of tokens, and of course comments are a token
by themselfs. Maybe what you want is sqlparse.split(sql), that split
sql in a list of strings, or sqlparse.parse(sql) that return a list of
statements. You should use the stack only if you need to do low level
processing of the sql strings (as i need to do).

http://sqlparse.readthedocs.org/en/latest/api/
> --
> You received this message because you are subscribed to the Google Groups
> "sqlparse" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/sqlparse/-/9B7_f09QkfMJ.
> To post to this group, send email to sqlp...@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlparse+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlparse?hl=en.



--
"Si quieres viajar alrededor del mundo y ser invitado a hablar en un
monton de sitios diferentes, simplemente escribe un sistema operativo
Unix."
– Linus Tordvals, creador del sistema operativo Linux

Anders Qvist

unread,
Apr 26, 2012, 5:03:15 AM4/26/12
to sqlp...@googlegroups.com
Of course they are tokens: we're looking at the .tokens list of a particular statement. My point was that there are comments and DML (in this case) inside the same Statement().

Having pondered the issue some more, I conclude that either:
  1. comments should be treated as whitespace and thus skipped by token_first() and token_next(), or
  2. token_first() and token_next(), need a separate parameter to skip past comments.

In the current regime, it gets unnecessarily tricky to get at the meat of the Statement().

I lean toward solution 1). What say thee?

On Thu, Apr 26, 2012 at 10:13 AM, pir...@gmail.com <pir...@gmail.com> wrote:
I think you are wrongly concepts: what you have there is not a list of
statements, but a list of tokens, and of course comments are a token
by themselfs. Maybe what you want is sqlparse.split(sql), that split
sql in a list of strings, or sqlparse.parse(sql) that return a list of
statements. You should use the stack only if you need to do low level
processing of the sql strings (as i need to do).

http://sqlparse.readthedocs.org/en/latest/api/

El día 25 de abril de 2012 19:43, Anders Qvist <bitt...@gmail.com> escribió:
> In the current master branch, comments are part of the following statement:
>
>>>> t = list(stack.run("-- a comment\nINSERT INTO foo VALUES (1,2);"))[0]
>>>> t.tokens
> [<Single '-- a c...' at 0xb7081784>, <DML 'INSERT' at 0xb7081504>,
> <Whitespace ' ' at 0xb7081a2c>, <Keyword 'INTO' at 0xb7081c5c>, <Whitespace
> ' ' at 0xb70818ec>, <Name 'foo' at 0xb7081d24>, <Whitespace ' ' at
> 0xb70817ac>, <Keyword 'VALUES' at 0xb70819dc>, <Whitespace ' ' at
> 0xb708166c>, <Punctuation '(' at 0xb70819b4>, <Integer '1' at 0xb7081dc4>,
> <Punctuation ',' at 0xb70811e4>, <Integer '2' at 0xb7081b1c>, <Punctuation
> ')' at 0xb7081644>, <Punctuation ';' at 0xb70817d4>]
>
> Is this intentional? My intuition was that the comment would be its own
> statement?
> If comments are part of a statement, perhaps .token_first() et al should
> skip past them as they do whitespace?

--
Quest

Andi Albrecht

unread,
Apr 26, 2012, 5:21:34 AM4/26/12
to sqlp...@googlegroups.com
On Thu, Apr 26, 2012 at 11:03 AM, Anders Qvist <bitt...@gmail.com> wrote:
> Of course they are tokens: we're looking at the .tokens list of a particular
> statement. My point was that there are comments and DML (in this case)
> inside the same Statement().
>
> Having pondered the issue some more, I conclude that either:
>
> comments should be treated as whitespace and thus skipped by token_first()
> and token_next(), or
> token_first() and token_next(), need a separate parameter to skip past
> comments.
>
> In the current regime, it gets unnecessarily tricky to get at the meat of
> the Statement().
>
> I lean toward solution 1). What say thee?

You can add a custom filter to your stack that removes leading
comments. The engine considers comments as part of SQL statements (in
terms of a SQL script). IMO removing them should go in a custom
filter.

--
Andi
> --
> You received this message because you are subscribed to the Google Groups
> "sqlparse" group.

pir...@gmail.com

unread,
Apr 26, 2012, 5:41:44 AM4/26/12
to sqlp...@googlegroups.com
>> I lean toward solution 1). What say thee?
>
> You can add a custom filter to your stack that removes leading
> comments. The engine considers comments as part of SQL statements (in
> terms of a SQL script). IMO removing them should go in a custom
> filter.
>
There are two filters that do this thing, StripComments() and
StripCommentsFilter(). Maybe we should unify them...

By the way, i think it also make sense that comments are threated as
whitespaces, or at least as independent statements (grouping several
comment tokens in a only one statement).

Anders Qvist

unread,
Apr 26, 2012, 4:05:34 PM4/26/12
to sqlp...@googlegroups.com
Filters are fine for this. Not sure if I get how to insert them tho? Shouldn't .parse() take arguments that allow us to insert preprocess and postprocess filters?

pir...@gmail.com

unread,
Apr 26, 2012, 7:06:53 PM4/26/12
to sqlp...@googlegroups.com

No, in that case you must to build the stack (or a pipeline) by hand.

Sent from my Android cell phone, please forgive the lack of format on the text, and my fat thumbs :-P

--
You received this message because you are subscribed to the Google Groups "sqlparse" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlparse/-/6vCTXHLmbx0J.

Andi Albrecht

unread,
Apr 26, 2012, 11:41:32 PM4/26/12
to sqlp...@googlegroups.com
On Thu, Apr 26, 2012 at 10:05 PM, Anders Qvist <bitt...@gmail.com> wrote:
> Filters are fine for this. Not sure if I get how to insert them tho?
> Shouldn't .parse() take arguments that allow us to insert preprocess and
> postprocess filters?

That's indeed not designed yet. I'd prefer to keep the parse()
function, as all top-level functions, as simple as possible. For now
the best thing is to build your own stack of filters - but keep in
mind that this is a part of sqlparse that's not documented yet and may
change. A good starting point is to look at sqlparse.format() or
split2().

--
Andi

>
>
> On Thursday, 26 April 2012 11:21:34 UTC+2, Andi Albrecht wrote:
>>
>>
>> You can add a custom filter to your stack that removes leading
>> comments. The engine considers comments as part of SQL statements (in
>> terms of a SQL script). IMO removing them should go in a custom
>> filter.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlparse" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/sqlparse/-/6vCTXHLmbx0J.
Reply all
Reply to author
Forward
0 new messages