Modify Tokens and Output New Formatted SQL.

275 views
Skip to first unread message

dsidlo

unread,
Dec 15, 2010, 6:45:43 PM12/15/10
to sqlparse
I am looking at using sqlparse 0.1.2 to perform automated changes to
some SQL from a parsed an tokenized viewpoint. Once I've made the
changes to the Tokenized data, I'd like to be able for format the
output using the Formatting function.

Would it be possible to feed the list of Tokenized SQL into the
Formatting function resulting in Formatted SQL?
If so, how would I do that?

Thanks,
-dsidlo

Andi Albrecht

unread,
Dec 16, 2010, 4:56:14 AM12/16/10
to sqlp...@googlegroups.com
I don't think it's possible with the current code base. But have a look at sqlparse.format() and sqlparse.engine.FilterStack(). The first builds a stack of filters that process the statement depending on the required formatting rules. The latter is the class which drives all formatting. The culprit that makes it currently impossible to feed in a token stream is the first part of sqlparse.engine.FilterStack.run (sqlparse/engine/__init__.py#40-50): There it is assumed that "sql" is a string and not already tokenized. Maybe you can omit some of the steps between line 40 and 50 to be able to pass in a token stream.

-Andi
 

Thanks,
-dsidlo

--
You received this message because you are subscribed to the Google Groups "sqlparse" group.
To post to this group, send email to sqlp...@googlegroups.com.
To unsubscribe from this group, send email to sqlparse+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.


dsidlo

unread,
Dec 16, 2010, 12:31:18 PM12/16/10
to sqlparse
Really appreciate the guidance.
I'll look into that.
Thanks.

On Dec 16, 1:56 am, Andi Albrecht <albrecht.a...@googlemail.com>
wrote:
> > sqlparse+u...@googlegroups.com<sqlparse%2Bunsu...@googlegroups.com>
> > .

David Sidlo

unread,
Dec 17, 2010, 6:23:27 PM12/17/10
to sqlp...@googlegroups.com

dsidlo

unread,
Jan 4, 2011, 7:41:34 PM1/4/11
to sqlparse
I found that it was possible to reformat a modified token tree using
the following...

def reformatTokens(tokens):
# To reformat the tokenized SQL...
sqlX = ""
# Convert tokens into strings, and append into 1 string
for t in tokens[:]:
sqlX = sqlX + str(t)
# Use the existing format method to perform reformat
print sqlparse.format(sqlX, reindent=True, keyword_case='upper')

To determine the token type, it seemed easier to use _get_repr_name()
instead of match().

I needed to perform transformations on 'Identifier' tokens. The text
data of these tokens was
not available via 'value'; I found it via unicode(tokenVar) instead.
And, to modify an existing token, I found that I had to use the
__init__() method because
the tokens value is stored via __slots__, which make the values
unchangeable in some cases.

Finally, I am creating an expression that needs to be added to the
where clause of the existing tokenized SQL.
I need to insert this expression in it's tokenized form into existing
tokenized SQL.
To generate the tokenized form of the machine generated expression I
just call sqlparse.parse(expStr) on the
expression string. Then I'll manipulate the token structures to
perform the insertion...

Looks like this library is working out for me... Thanks again.

Andi Albrecht

unread,
Jan 6, 2011, 1:38:21 PM1/6/11
to sqlp...@googlegroups.com
On Wed, Jan 5, 2011 at 1:41 AM, dsidlo <dsi...@gmail.com> wrote:
> I found that it was possible to reformat a modified token tree using
> the following...
>
> def reformatTokens(tokens):
>    # To reformat the tokenized SQL...
>    sqlX = ""
>    # Convert tokens into strings, and append into 1 string
>    for t in tokens[:]:
>        sqlX = sqlX + str(t)
>    # Use the existing format method to perform reformat
>    print sqlparse.format(sqlX, reindent=True, keyword_case='upper')

Glad to hear that!

Instead of appending each token to a string you can simply write

sqlparse.format(''.join(str(t) for t in tokens), ...)

>
> To determine the token type, it seemed easier to use _get_repr_name()
> instead of match().
>
> I needed to perform transformations on 'Identifier' tokens. The text
> data of these tokens was
> not available via 'value'; I found it via unicode(tokenVar) instead.

ah, ok... I've filed a bug report for this as a reminder (issue30).

> And, to modify an existing token, I found that I had to use the
> __init__() method because
> the tokens value is stored via __slots__, which make the values
> unchangeable in some cases.

Can you give an example? Did you try to set the value attribute of an
Identifier? The identifier token may contain a subtoken (Name) and
somehow changing "value" of identifier is ignored in that
case. However, the behavior is somewhat unexpected.

>
> Finally, I am creating an expression that needs to be added to the
> where clause of the existing tokenized SQL.
> I need to insert this expression in it's tokenized form into existing
> tokenized SQL.
> To generate the tokenized form of the machine generated expression I
> just call sqlparse.parse(expStr) on the
> expression string. Then I'll manipulate the token structures to
> perform the insertion...
>
> Looks like this library is working out for me... Thanks again.

Thanks for the feedback!

-Andi

>
> --
> You received this message because you are subscribed to the Google Groups "sqlparse" group.
> To post to this group, send email to sqlp...@googlegroups.com.

> To unsubscribe from this group, send email to sqlparse+u...@googlegroups.com.

dsidlo

unread,
Jan 6, 2011, 3:23:32 PM1/6/11
to sqlparse


On Jan 6, 10:38 am, Andi Albrecht <albrecht.a...@googlemail.com>
wrote:
> On Wed, Jan 5, 2011 at 1:41 AM, dsidlo <dsi...@gmail.com> wrote:
> > I found that it was possible to reformat a modified token tree using
> > the following...
>
> > def reformatTokens(tokens):
> >    # To reformat the tokenized SQL...
> >    sqlX = ""
> >    # Convert tokens into strings, and append into 1 string
> >    for t in tokens[:]:
> >        sqlX = sqlX + str(t)
> >    # Use the existing format method to perform reformat
> >    print sqlparse.format(sqlX, reindent=True, keyword_case='upper')
>
> Glad to hear that!
>
> Instead of appending each token to a string you can simply write
>
> sqlparse.format(''.join(str(t) for t in tokens), ...)

Cool.

>
> > To determine the token type, it seemed easier to use _get_repr_name()
> > instead of match().
>
> > I needed to perform transformations on 'Identifier' tokens. The text
> > data of these tokens was
> > not available via 'value'; I found it via unicode(tokenVar) instead.
>
> ah, ok... I've filed a bug report for this as a reminder (issue30).
>
> > And, to modify an existing token, I found that I had to use the
> > __init__() method because
> > the tokens value is stored via __slots__, which make the values
> > unchangeable in some cases.
>
> Can you give an example? Did you try to set the value attribute of an
> Identifier? The identifier token may contain a subtoken (Name) and
> somehow changing "value" of identifier is ignored in that
> case. However, the behavior is somewhat unexpected.

You are right. <Identifier...> actually contains a subset of tokens.
In my problem, I am targeting variables that look like
"MetaTable::MetaFld" so that I can transform them
into more complex functions such as regexp_extract(...).
The variables are showing up as <Identifiers because sqlparse treats
'::' as a type cast.
So the fact that the <Identifiers value can't be changed is prob not
an issue.
But it is interesting and seemingly useful that I can at least change
<Identifier as I need to via __init__().
When I change the <Identifier via __init__(). I loose the token list
held by that <Identifier.
But .tokens on the <Identifier token just returns the string that I
init'ed it to.
(Which works for my current needs).

It might be useful to add a method that allows one to change the code
string under a token list such as <Identifier, <Parenthesis, etc..
that would change the string for the token list and reparse/regenerate
the tokens below it. Thus, keeping the token tree correct.
On the other hand, one simply needs to perform:
sqlparse.parse(''.join(str(t) for t in tokens))
to bring all tokens up to date.

>
> > Finally, I am creating an expression that needs to be added to the
> > where clause of the existing tokenized SQL.
> > I need to insert this expression in it's tokenized form into existing
> > tokenized SQL.
> > To generate the tokenized form of the machine generated expression I
> > just call sqlparse.parse(expStr) on the
> > expression string. Then I'll manipulate the token structures to
> > perform the insertion...
>
> > Looks like this library is working out for me... Thanks again.
>
> Thanks for the feedback!
>
> -Andi
>
> > --
> > You received this message because you are subscribed to the Google Groups "sqlparse" group.
> > To post to this group, send email to sqlp...@googlegroups.com.
> > To unsubscribe from this group, send email to sqlparse+u...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/sqlparse?hl=en.


In [1]: import sqlparse

In [2]:

In [3]: sqlstr = " \
...: select \
...: dt, \
...: sv, \
...: userid, \
...: action \
...: from \
...: ( select \
...: _::dt, \
...: _::sv, \
...: trustmw::svuserid, \
...: wfimw::action \
...: from poc \
...: where linedate_p = '2010-11-08' \
...: and trustmw::svuserid > '1000' \
...: and wfimw::action <> 'turnover' \
...: ) a \
...: where linedate_p = '2010-11-08' \
...: and _::sv = 'INFO' \
...: and regexp_replace(_::dt, '([A-Za-z\s\*\%\&\?_]+|[\-\+]?[\d
\.]+|\<\>|\</\>)', '')= '\:\:,\=\-\:\=\=\=' or \
...: group by ftype, sv, userid, action; \
...: "

In [4]: stmnts = sqlparse.parse(sqlstr)

In [5]: stmnt = stmnts[0]

In [6]: stmnt.to_unicode()
Out[6]: u" select \tdt, \tsv, \tuserid, \taction from ( select
\t_::dt, \t_::sv, \ttrustmw::svuserid, \twfimw::action from poc where
linedate_p = '2010-11-08' and trustmw::svuserid > '1000' and
wfimw::action <> 'turnover' ) a where linedate_p = '2010-11-08' and
_::sv = 'INFO' and regexp_replace(_::dt, '([A-Za-z\\s\\*\\%\\&\\?_]+|
[\\-\\+]?[\\d\\.]+|\\<\\>|\\</\\>)', '')= '\\:\\:,\\=\\-\\:\\=\\=\\='
or group by ftype, sv, userid, action; "

In [7]: stmnt.tokens
Out[7]:
[<Whitespace ' ' at 0x2af4f20f4908>,
<DML 'select' at 0x2af4f20f4950>,
<Whitespace ' ' at 0x2af4f20f4998>,
<IdentifierList 'dt, s...' at 0x2af4f2100a28>,
<Whitespace ' ' at 0x2af4f20f4cb0>,
<Keyword 'from' at 0x2af4f20f4cf8>,
<Whitespace ' ' at 0x2af4f20f4d40>,
<Parenthesis '( sele...' at 0x2af4f20c8a28>,
<Whitespace ' ' at 0x2af4f20fbd88>,
<Identifier 'a' at 0x2af4f2100398>,
<Whitespace ' ' at 0x2af4f20fbe18>,
<Where 'where ...' at 0x2af4f20e9050>,
<Keyword 'group' at 0x2af4f20fc950>,
<Whitespace ' ' at 0x2af4f20fc998>,
<Keyword 'by' at 0x2af4f20fc9e0>,
<Whitespace ' ' at 0x2af4f20fca28>,
<IdentifierList 'ftype,...' at 0x2af4f2100aa0>,
<Punctuation ';' at 0x2af4f20fcd40>,
<Whitespace ' ' at 0x2af4f20fcd88>]

In [8]: s = stmnt.tokens[11]

In [9]: s
Out[9]: <Where 'where ...' at 0x2af4f20e9050>

In [10]: s.tokens
Out[10]:
[<Keyword 'where' at 0x2af4f20fbe60>,
<Whitespace ' ' at 0x2af4f20fbea8>,
<Name 'lineda...' at 0x2af4f20fbef0>,
<Whitespace ' ' at 0x2af4f20fbf38>,
<Comparison '=' at 0x2af4f20fbf80>,
<Whitespace ' ' at 0x2af4f20fbfc8>,
<Single ''2010-...' at 0x2af4f20fc050>,
<Whitespace ' ' at 0x2af4f20fc098>,
<Keyword 'and' at 0x2af4f20fc0e0>,
<Whitespace ' ' at 0x2af4f20fc128>,
<Identifier '_::sv' at 0x2af4f20e98c0>,
<Whitespace ' ' at 0x2af4f20fc248>,
<Comparison '=' at 0x2af4f20fc290>,
<Whitespace ' ' at 0x2af4f20fc2d8>,
<Single ''INFO'' at 0x2af4f20fc320>,
<Whitespace ' ' at 0x2af4f20fc368>,
<Keyword 'and' at 0x2af4f20fc3b0>,
<Whitespace ' ' at 0x2af4f20fc3f8>,
<Function 'regexp...' at 0x2af4f20d86e0>,
<Comparison '=' at 0x2af4f20fc7a0>,
<Whitespace ' ' at 0x2af4f20fc7e8>,
<Single ''\:\:,...' at 0x2af4f20fc830>,
<Whitespace ' ' at 0x2af4f20fc878>,
<Keyword 'or' at 0x2af4f20fc8c0>,
<Whitespace ' ' at 0x2af4f20fc908>]

-----------------------------------------------------------------
--- The Identifier actually contains a token list...
-----------------------------------------------------------------

In [11]: stmnt.tokens[11].tokens[10]
Out[11]: <Identifier '_::sv' at 0x2af4f20e98c0>

In [12]: stmnt.tokens[11].tokens[10].tokens
Out[12]:
[<Name '_' at 0x2af4f20fc170>,
<Punctuation '::' at 0x2af4f20fc1b8>,
<Name 'sv' at 0x2af4f20fc200>]

-----------------------------------------------------------------
- Changing the <Identifier token...
-----------------------------------------------------------------

In [15]: stmnt.tokens[11].tokens[10].__init__('_::xxxx')

In [16]: stmnt.tokens[11].tokens[10]
Out[16]: <Identifier '_::xxxx' at 0x2af4f20e98c0>

-----------------------------------------------------------------
- Changing the token list help by the <Identifier...
-----------------------------------------------------------------

In [17]: stmnt.tokens[11].tokens[10].tokens
Out[17]: '_::xxxx'

-----------------------------------------------------------------
- Changes to <Identifiers results in expected output...
-----------------------------------------------------------------

In [34]: print sqlparse.format(''.join(str(t) for t in stmnt.tokens),
reindent=True, keyword_case='upper')

SELECT dt,
sv,
userid,
action
FROM
(SELECT _::dt,
_::sv,
trustmw::svuserid,
wfimw::action
FROM poc
WHERE linedate_p = '2010-11-08'
AND trustmw::svuserid > '1000'
AND wfimw::action <> 'turnover') a
WHERE linedate_p = '2010-11-08'
AND _::xxxx = 'INFO' <------------------------- *** Changed as
expected ***
AND regexp_replace(_::dt, '([A-Za-z\s\*\%\&\?_]+|[\-\+]?[\d\.]+|\<\>|
\</\>)', '')= '\:\:,\=\-\:\=\=\='
OR
GROUP BY ftype,
sv,
userid,
action;

-David

Andi Albrecht

unread,
Jan 7, 2011, 4:06:08 AM1/7/11
to sqlp...@googlegroups.com

Maybe something like a replace_token() function would be handy (and as
you outlined above, easy to implement). Here's some dummy code to
illustrate:

>>> identifier = statement.tokens[2] # e.g. grab an identifier
>>> str(identifier)
'foo::bar'
>>> statement.replace_token(identifier, 'some_func(%s)' % str(identifier))

The advantage compared to changing the content of an existing token is
that the class of a token could be changed to. In the example above
changing the value of Identifier to "some_func(foo::bar)" wouldn't
help much, since a Function class would be the correct class for this
token.

Reply all
Reply to author
Forward
0 new messages