Questions regarding to filters

293 views
Skip to first unread message

pir...@gmail.com

unread,
Aug 2, 2011, 4:00:32 AM8/2/11
to sqlp...@googlegroups.com
Hi! :-) I have just found this project looking for a python parser for
SQL and i think that's what i needed to parse and compact the queries
of my filesystem (http://pirannafs.blogspot.com and
https://github.com/piranna/PirannaFS), only that i have to develop my
own filters.

What i need are two things:
* parse metadata inside some comments
* "compact" the query (no comments, no whitespaces, no return lines...)

As i have read, instead of use the sqlparse.format() function i should
make my own filters and build by hand my own pipelines, isn't it?
There is any filters for the things that i want (i know there is one
for stripcomments and another for strip whitespaces, but i don't know
if it removes return lines also...)? How i can be able to stract data
from the comments?

As a last point, how can i be able to upload the filters i create?
Doing a clone of the project and sending a pull request as in GitHub?

Greetings, Jesús Leganés Combarro.


--
"Si quieres viajar alrededor del mundo y ser invitado a hablar en un
monton de sitios diferentes, simplemente escribe un sistema operativo
Unix."
– Linus Tordvals, creador del sistema operativo Linux

Andi Albrecht

unread,
Aug 2, 2011, 6:31:25 AM8/2/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

> Hi! :-) I have just found this project looking for a python parser for
> SQL and i think that's what i needed to parse and compact the queries
> of my filesystem (http://pirannafs.blogspot.com and
> https://github.com/piranna/PirannaFS), only that i have to develop my
> own filters.
>
> What i need are two things:
> * parse metadata inside some comments
> * "compact" the query (no comments, no whitespaces, no return lines...)
>
> As i have read, instead of use the sqlparse.format() function i should
> make my own filters and build by hand my own pipelines, isn't it?
> There is any filters for the things that i want (i know there is one
> for stripcomments and another for strip whitespaces, but i don't know
> if it removes return lines also...)? How i can be able to stract data
> from the comments?

Yes, you're right. Both can be achieved by implementing your own
filters.

ATM there's no official API on how to build your own pipelines - so,
you've been warned, this may change :)

But have a look at sqlparse.formatter.build_filter_stack(). This
functions builds a pipeline of filters depending on give formatting
options. I think you'll need to build your own filter stack using either
filters distributed with sqlparse or your custom filters in a very
similar way. Note that there are three stages: token filters, filters
that work on grouped items and serializers. Since you've already
encountered the stripcomments and strip whitespaces filters, you should
be aware of the different signatures.

>
> As a last point, how can i be able to upload the filters i create?
> Doing a clone of the project and sending a pull request as in GitHub?

If your filters are of general interest, I'd be happy to add
them. "Compacting" a query could be useful in some cases (when pushing a
SQL statement over the network for example). But I'm not sure if there's
general interest in a filter that parses metadata in comments.

The project is hosted in a Mercurial repository on Google Code. IIRC
they have a "request code review" feature somewhere on Google Code after
you've cloned the repository there. But you can send me a pull request
from anywhere where your clone is publicity available. Good ol' patches
are fine too :)

-Andi

pir...@gmail.com

unread,
Aug 2, 2011, 6:55:13 AM8/2/11
to sqlp...@googlegroups.com
> Yes, you're right. Both can be achieved by implementing your own
> filters.
>
Good to know :-) The only thing is the metadata parser, i'll have to
look if i get a solution to be able to get them later (maybe a static
attribute?)

> ATM there's no official API on how to build your own pipelines - so,
> you've been warned, this may change :)
>

It's not the first library that i use in alpha stage... :-P

> But have a look at sqlparse.formatter.build_filter_stack(). This
> functions builds a pipeline of filters depending on give formatting
> options.

In fact it's from where i take the concept of pipeline, the parser
structure is similar to the GStreamer (another toy i always want to
have in my toys box ;-) ) pipelines structure... :-)

> Note that there are three stages: token filters, filters
> that work on grouped items and serializers. Since you've already
> encountered the stripcomments and strip whitespaces filters, you should
> be aware of the different signatures.
>

Thank you! :-D Really i only take a fast read of the code to
understand how it works, really i didn't notice the different
signatures... :-P Yes, now re-thinking about it it's true there are
three APIs, one for the tokenizer, another for the filters and another
for the output... really intelligent... :-)

>>
> If your filters are of general interest, I'd be happy to add
> them. "Compacting" a query could be useful in some cases (when pushing a
> SQL statement over the network for example).

I wanted to store the queries ready to be launched to the database,
but yes, that's the idea.


> But I'm not sure if there's
> general interest in a filter that parses metadata in comments.
>

Obviusly... at least it's a no so common feature... :-P Another one
that i think i would need it's INCLUDE one (SQLite doesn't support
it), but i don't know if it's better as a metadata or as a statement,
appart that i need to solve other issues before regarding to the
rowindex if i want the INCLUDE be useful...

> The project is hosted in a Mercurial repository on Google Code. IIRC
> they have a "request code review" feature somewhere on Google Code after
> you've cloned the repository there.

Ok, i'll make my changes and later i'll port it to the clone if i see
i need to modify core code.


> But you can send me a pull request
> from anywhere where your clone is publicity available. Good ol' patches
> are fine too :)
>

Lol :-P

Thank you so much Andi! :-D I'll told later how i'm going with it :-)

Andi Albrecht

unread,
Aug 5, 2011, 2:36:03 PM8/5/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

>> Yes, you're right. Both can be achieved by implementing your own
>> filters.
>>
> Good to know :-) The only thing is the metadata parser, i'll have to
> look if i get a solution to be able to get them later (maybe a static
> attribute?)
>
>> ATM there's no official API on how to build your own pipelines - so,
>> you've been warned, this may change :)
>>
> It's not the first library that i use in alpha stage... :-P
>
>> But have a look at sqlparse.formatter.build_filter_stack(). This
>> functions builds a pipeline of filters depending on give formatting
>> options.
>
> In fact it's from where i take the concept of pipeline, the parser
> structure is similar to the GStreamer (another toy i always want to
> have in my toys box ;-) ) pipelines structure... :-)
>
>> Note that there are three stages: token filters, filters
>> that work on grouped items and serializers. Since you've already
>> encountered the stripcomments and strip whitespaces filters, you should
>> be aware of the different signatures.
>>
> Thank you! :-D Really i only take a fast read of the code to
> understand how it works, really i didn't notice the different
> signatures... :-P Yes, now re-thinking about it it's true there are
> three APIs, one for the tokenizer, another for the filters and another
> for the output... really intelligent... :-)

The idea behind this is to do as little parsing work as needed. If you
only want to change the keywords or identifiers to upper case then only
filters on tokenizer level are added for example. The more expensive
grouping didn't happen then.

>
>>>
>> If your filters are of general interest, I'd be happy to add
>> them. "Compacting" a query could be useful in some cases (when pushing a
>> SQL statement over the network for example).
>
> I wanted to store the queries ready to be launched to the database,
> but yes, that's the idea.
>
>
>> But I'm not sure if there's
>> general interest in a filter that parses metadata in comments.
>>
> Obviusly... at least it's a no so common feature... :-P Another one
> that i think i would need it's INCLUDE one (SQLite doesn't support
> it), but i don't know if it's better as a metadata or as a statement,
> appart that i need to solve other issues before regarding to the
> rowindex if i want the INCLUDE be useful...
>
>> The project is hosted in a Mercurial repository on Google Code. IIRC
>> they have a "request code review" feature somewhere on Google Code after
>> you've cloned the repository there.
>
> Ok, i'll make my changes and later i'll port it to the clone if i see
> i need to modify core code.
>
>
>> But you can send me a pull request
>> from anywhere where your clone is publicity available. Good ol' patches
>> are fine too :)
>>
> Lol :-P
>
> Thank you so much Andi! :-D I'll told later how i'm going with it :-)

Cool! Let me know how it worked for you :)

-Andi

pir...@gmail.com

unread,
Aug 5, 2011, 7:10:43 PM8/5/11
to sqlp...@googlegroups.com
Hi Andi! Sorry for not writting to you before, i was a little bit busy.

I have been working with your library and got some results like a
filter to implement a INCLUDE statement (a filter to include the code
from inside another file, just like include pragma at C code):


class IncludeStatement(Filter):
"""Filter that enable a INCLUDE statement to """

def __init__(self, dirpath="."):
self.dirpath = abspath(dirpath)
self.detected = False

def process(self, stack, stream):
# Run over all tokens in the stream
for token_type, value in stream:
# INCLUDE statement found, set detected mode
if token_type in Name and value.upper() == 'INCLUDE':
self.detected = True
continue

# INCLUDE statement was found, parse it
elif self.detected:
# Omit whitespaces
if token_type in Whitespace:
pass

# Get path of file to include
path = None

# if token_type in Name:
# path = join(self.dirpath, value)
if token_type in tokens.String.Symbol:
path = join(self.dirpath, value[1:-1])

# Include file if path was found
if path:
try:
with open(path) as f:
sql = f.read()

except IOError:
print path, "file doesn't exists"

else:
# Create new FilterStack to parse readed file
# and add all its tokens to the main stack recursively
# [ToDo] Add maximum recursive iteration value
stack = FilterStack()
stack.preprocess.append(IncludeStatement(self.dirpath))

for tv in stack.run(sql):
yield tv

# Set normal mode
self.detected = False

# Don't include any token while in detected mode
continue

# Normal token
yield token_type, value


But the fact is that it was difficult for diferent reasons, i hope
they would be useful to you:

* i didn't know what clases to use to check the tokens. Looking at the
code i found some definitions at tokens.py, but they were done in a
strange way (i didn't see this before) and also i wasn't be able to
found the classes code, being something like magin for me. More that
this, this didn't allow me to import just the clases i wanted because
they were not found, i needed to import all the full module (it's
slower at runtime to use). More that this, i needed to update the
KEYWORD list, what was impossible... :-(

* Also, they are not all of them defined, so looking at the parse
stack i found a Symbol class one for a quoted string and needed to
search over all the code where it was defined (i found it inside
tokens.String.Symbol)

* another thing is that having three diferent kind of filters with
different signs and return values (or yield) gets confused. At a
previous work they were using a pipeline system with diferent stages
were the data was stored inside a dict that was being modified at
every stage and when the stage finished it was pased to the next one
by the pipeline.

* another thing (i believe i have read something similar at the mail
list) it's about that SELECT stament columns are not grouped, what i
think is something really important (since they are used usually as a
full)... and also it's something i would need by myself :-D More that
this, also the KeywordCaseFilter identified a column named 'size' as a
SQL keyword so it capitalized it, what would break the query... :-/

* finally, i found example filters overcomplicated. I did a
StripComments class before knowing there was one inside filters.py and
it's working perfectly being so much smaller...


class StripComments(Filter):
def process(self, stack, stream):
"""Process the token stream."""
for token_type, value in stream:
if token_type not in Comment:
yield token_type, value

class StripCommentsFilter(Filter):

def _process(self, tlist):
clss = set([x.__class__ for x in tlist.tokens])
while sql.Comment in clss:
token = tlist.token_next_by_instance(0, sql.Comment)
tidx = tlist.token_index(token)
prev = tlist.token_prev(tidx, False)
next_ = tlist.token_next(tidx, False)
# Replace by whitespace if prev and next exist and if they're not
# whitespaces. This doesn't apply if prev or next is a paranthesis.
if (prev is not None and next_ is not None
and not prev.is_whitespace() and not next_.is_whitespace()
and not (prev.match(T.Punctuation, '(')
or next_.match(T.Punctuation, ')'))):
tlist.tokens[tidx] = sql.Token(T.Whitespace, ' ')
else:
tlist.tokens.pop(tidx)
clss = set([x.__class__ for x in tlist.tokens])

def process(self, stack, stmt):
[self.process(stack, sgroup) for sgroup in stmt.get_sublists()]
self._process(stmt)


In any case, i know it's just an alpha stage, so i know it can
change... :-P These are only my perceptions, i think they would be
useful. By the way, i think that the library is interesting and i
would like to help you with it if possible :-)

>>> Note that there are three stages: token filters, filters
>>> that work on grouped items and serializers. Since you've already
>>> encountered the stripcomments and strip whitespaces filters, you should
>>> be aware of the different signatures.
>>>
>> Thank you! :-D Really i only take a fast read of the code to
>> understand how it works, really i didn't notice the different
>> signatures... :-P Yes, now re-thinking about it it's true there are
>> three APIs, one for the tokenizer, another for the filters and another
>> for the output... really intelligent... :-)
>
> The idea behind this is to do as little parsing work as needed. If you
> only want to change the keywords or identifiers to upper case then only
> filters on tokenizer level are added for example. The more expensive
> grouping didn't happen then.

So it's done to be the parser faster, isn't it?

Andi Albrecht

unread,
Aug 9, 2011, 10:01:40 AM8/9/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

Good points! Thanks for pointing this out!

>
> * Also, they are not all of them defined, so looking at the parse
> stack i found a Symbol class one for a quoted string and needed to
> search over all the code where it was defined (i found it inside
> tokens.String.Symbol)

You're right. The definitions need to be cleaned up.

>
> * another thing is that having three diferent kind of filters with
> different signs and return values (or yield) gets confused. At a
> previous work they were using a pipeline system with diferent stages
> were the data was stored inside a dict that was being modified at
> every stage and when the stage finished it was pased to the next one
> by the pipeline.
>
> * another thing (i believe i have read something similar at the mail
> list) it's about that SELECT stament columns are not grouped, what i
> think is something really important (since they are used usually as a
> full)... and also it's something i would need by myself :-D More that
> this, also the KeywordCaseFilter identified a column named 'size' as a
> SQL keyword so it capitalized it, what would break the query... :-/

ok, that's a bug (size). However, the column grouping needs some work
though.

Nice! I'll have a look at it :)

>
>
> In any case, i know it's just an alpha stage, so i know it can
> change... :-P These are only my perceptions, i think they would be
> useful. By the way, i think that the library is interesting and i
> would like to help you with it if possible :-)

Sure!


Thanks very much for your feedback! I'll try to turn some of your points
into TODOs.

-Andi

pir...@gmail.com

unread,
Aug 9, 2011, 1:36:27 PM8/9/11
to sqlp...@googlegroups.com
> Thanks very much for your feedback! I'll try to turn some of your points
> into TODOs.
>
Welcome :-) If you need a betatester or another point of view
regarding to any aspect of the parser or someone that can help you
call me ;-)

pir...@gmail.com

unread,
Aug 11, 2011, 2:45:22 PM8/11/11
to sqlp...@googlegroups.com
Hi Andi, i have been finishing my SELECT columns filter and adding
them to a clone of the project so you can check and pull them. I have
seen two more things:

* StripWhiteSpaceFilter is not fully working, leaving blank spaces
afters commas (','), dot-and-commas (';') and parenthesis, and
sometimes duplicate spaces. I would modify it but i don't end to
understand how it works, so i'll try to do my own filter to see if i
get better results. In any case i've attached to you my test file so
you can be able to check them
* Viewing the structure of the filters and being only called from
outside only the process method, i think it would be cleaner to
convert them to functors, using __call__() instead of process()

The commit is at
https://code.google.com/r/piranna-sqlparse/source/detail?r=514c37ec255c3250b5f37ab9182c0a2aec6bd34a
I don't know why the "Request code review" doesn't work, so you'll
have to do the pull by hand... :-(

sql.py

Andi Albrecht

unread,
Aug 12, 2011, 8:41:17 AM8/12/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

> Hi Andi, i have been finishing my SELECT columns filter and adding
> them to a clone of the project so you can check and pull them. I have
> seen two more things:
>
> * StripWhiteSpaceFilter is not fully working, leaving blank spaces
> afters commas (','), dot-and-commas (';') and parenthesis, and
> sometimes duplicate spaces. I would modify it but i don't end to
> understand how it works, so i'll try to do my own filter to see if i
> get better results. In any case i've attached to you my test file so
> you can be able to check them
> * Viewing the structure of the filters and being only called from
> outside only the process method, i think it would be cleaner to
> convert them to functors, using __call__() instead of process()
>
> The commit is at
> https://code.google.com/r/piranna-sqlparse/source/detail?r=514c37ec255c3250b5f37ab9182c0a2aec6bd34a
> I don't know why the "Request code review" doesn't work, so you'll
> have to do the pull by hand... :-(

I've pulled your changes. Thanks!

-Andi

pir...@gmail.com

unread,
Aug 12, 2011, 11:25:03 AM8/12/11
to sqlp...@googlegroups.com
Great! :-D I have been reviewing the clean ups and are really nice,
i'll update my code to reference to it :-)


2011/8/12 Andi Albrecht <albrec...@googlemail.com>:

> --
> You received this message because you are subscribed to the Google Groups "sqlparse" group.
> To post to this group, send email to sqlp...@googlegroups.com.
> To unsubscribe from this group, send email to sqlparse+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.

Andi Albrecht

unread,
Aug 12, 2011, 1:03:19 PM8/12/11
to sqlp...@googlegroups.com
On Fri, Aug 12, 2011 at 5:25 PM, pir...@gmail.com <pir...@gmail.com> wrote:
> Great! :-D I have been reviewing the clean ups and are really nice,
> i'll update my code to reference to it :-)

Have you seen the change regarding the with statement? That's
something the buildbot brought up. ATM sqlparse is compatible even
with Python 2.4. To be honest, I'm not sure how useful that is at all.
And I'm sure that at some point it'd make sense to stop supporting 2.4
(or even 2.5). But if it's just a with statement replaced by the more
"traditional" way it's ok to keep this compatibility up IMO. :)

But don't worry, almost always I leave it up to the buildbot to test
Python 2.4 compatibility.

-Andi

pir...@gmail.com

unread,
Aug 12, 2011, 4:03:24 PM8/12/11
to sqlp...@googlegroups.com
Yeah, i saw the with statement commit. At the same work of the
pipelines they were still using Python 2.4 in production. I think it's
so much outdated, but i agree with you: if it's just a matter of
change the with statement with the old-fashion style i don't see any
problem besides it's uglier... :-P


2011/8/12 Andi Albrecht <albrec...@googlemail.com>:

pir...@gmail.com

unread,
Aug 15, 2011, 11:24:51 PM8/15/11
to sqlp...@googlegroups.com
Hi Andi, while using SQLParse at my project
(https://github.com/piranna/PirannaFS) i have seen that one of my
filters, Columns, needed some review to return the names of the
columns instead of the tokens, since we are only interested on the
names and not on the types since all of them are in fact of type
tokens.Name :-P I have uploaded it to my repo so you can get it.

Besides that, i have seen that my code have get very slower. SQLParse
is only used at initialization (once the SQL queries are parsed, i
don't use it anymore) but since i'm doing a lot of check of
characteristics of the SQL strings my unit test time have increased
x4, being this just for SQLParse :-/ I have been looking at it and it
seems is because how FilterStack is implemented. At this moment, you
first the stack and later you exec it, but there's no way to make a
partial process and later use this as basis for the other queries. I
think that just being able to use the same tokens tree once it's
created for all the different checks i'm doing would get almost of the
time down, so i was thinking to develop a pipeline system similar in
concept to how GStreamer works, where data goes in and data goes out,
being able this way to have intermediate data available to later
filters, but maybe this way the API would be incompatible (i hope
don't). Do you think this is the right way to do it? There's another
way to get what i want to do?

2011/8/12 pir...@gmail.com <pir...@gmail.com>:

Andi Albrecht

unread,
Aug 17, 2011, 8:07:11 AM8/17/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

> Hi Andi, while using SQLParse at my project
> (https://github.com/piranna/PirannaFS) i have seen that one of my
> filters, Columns, needed some review to return the names of the
> columns instead of the tokens, since we are only interested on the
> names and not on the types since all of them are in fact of type
> tokens.Name :-P I have uploaded it to my repo so you can get it.

Filters at this point should return 2-tuples (type, value). Your change
breaks with this. Wouldn't it be easier to build a custom filter stack
with an additional filter that returns the columns then?

>
> Besides that, i have seen that my code have get very slower. SQLParse
> is only used at initialization (once the SQL queries are parsed, i
> don't use it anymore) but since i'm doing a lot of check of
> characteristics of the SQL strings my unit test time have increased
> x4, being this just for SQLParse :-/ I have been looking at it and it

Regarding the performance loss, have you seen this issue?
http://code.google.com/p/python-sqlparse/issues/detail?id=41

> seems is because how FilterStack is implemented. At this moment, you
> first the stack and later you exec it, but there's no way to make a
> partial process and later use this as basis for the other queries. I
> think that just being able to use the same tokens tree once it's
> created for all the different checks i'm doing would get almost of the
> time down, so i was thinking to develop a pipeline system similar in
> concept to how GStreamer works, where data goes in and data goes out,
> being able this way to have intermediate data available to later
> filters, but maybe this way the API would be incompatible (i hope
> don't). Do you think this is the right way to do it? There's another
> way to get what i want to do?

I don't know which concept GStreamer uses for there pipelines. IIRC
you've mentioned a dictionary store a state during all
transformations. Is it that what you're thinking of?

-Andi

pir...@gmail.com

unread,
Aug 17, 2011, 9:55:32 AM8/17/11
to sqlp...@googlegroups.com
> Filters at this point should return 2-tuples (type, value). Your change
> breaks with this. Wouldn't it be easier to build a custom filter stack
> with an additional filter that returns the columns then?
>
It's true, maybe it's easier, or at least it's coherent with the
current code basis. In any case i'm doing a proof of concept for the
pipeline system and it seems promising to solve this and some
performance problems.


> Regarding the performance loss, have you seen this issue?
> http://code.google.com/p/python-sqlparse/issues/detail?id=41
>

I'm triying to see it, but Google Code server is having problems at
this moment... :-/ I'll try it again later.


> I don't know which concept GStreamer uses for there pipelines. IIRC
> you've mentioned a dictionary store a state during all
> transformations. Is it that what you're thinking of?
>

At GStreamer, data flow thought plugins (filters) from one plugin
source (output) to next plugin sink (input), just like a water
pipeline, but only if both source and sink understand the same caps
(data format). Another solution is just like my previous work, where
every stage got a temporal dict as parameter with it's data and it was
being modified thought every stage, just like a car asembly line.

My pipeline proof of concept is more similar to GStreamer one, because
i think is more elegant and also using generators it could be more
efficient. First i have a pipeline class, that's basically a python
list where i store the filters, and then i use the output of the
filter (the generator) as the stream source for the next one. More
than this, when executing the pipeline it has the same method sign
that filters, so you can pass an input stream and it's given to the
first filter. Finally, the processed data is stabilized and returned.
The mayor modification is that i'm using callable objects (standar
functions and objects with __call__() method) instead of regular
filters, so they can be used in any fashion way and the stream can
change it's type dinamically (string, array... whatever), and also it
lets to use filters indepently, because just using callable objects
they can be able to be called outside a pipeline.

Here you have an example of a callable filter:

class StripComments(Filter):
"""Strip the comments from a stack"""
def process(self, stack, stream):


for token_type, value in stream:
if token_type not in Comment:
yield token_type, value

def __call__(self, stream):
return self.process(None, stream)

As you can see, this way it could be used outside a pipeline this way:

filter = StripComments()
filter(stream)

But also, since python functions are objects, the filter would be a regular one:

def StripComments(stream):
"""Strip the comments from a stack"""


for token_type, value in stream:
if token_type not in Comment:
yield token_type, value

Pretty nice, eh? Now filters are just normal functions instead of
classes :-D Just an important question: until this moment i didn't
need to use the stack parameter in any filter, why is it being used
for?

I've attached my current pipeline code with a trivial working example,
i hope i could be able to finish it at my project code today. It's
using the functors system so it's able to use functions or objects
with the __call__() method, but it's the only difference with current
filters and it's debatable to modify the pipeline to use filters as
they are at this moment (classes with process method).

pipeline.py

pir...@gmail.com

unread,
Aug 17, 2011, 11:35:47 AM8/17/11
to sqlp...@googlegroups.com
Ok, finished conversion of my project code... and worked :-) Unit test
time have lowered with my pipeline system from more than 4 seconds to
just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's
a great improvement :-D I'm going to do some clean-ups previously to
send some pathches :-)


2011/8/17 pir...@gmail.com <pir...@gmail.com>:

pir...@gmail.com

unread,
Aug 17, 2011, 7:31:08 PM8/17/11
to sqlp...@googlegroups.com
I have just finished the clean-up and uploaded my code so you can
review it and tell me your comments about it... Main beneficts are
that code is simple and more modular since the pipeline instance is
"dumb", and also it let to reuse some procesed fragments of the
pipeline and also use it as another filter since it use the same
interface: get streamed input data and output data, althought is not
directly usable as a current format filter, but this would be easily
addapted using callables...


2011/8/17 pir...@gmail.com <pir...@gmail.com>:

Andi Albrecht

unread,
Aug 24, 2011, 7:48:20 AM8/24/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

Adding a __call__ method to filters looks good.

>
> But also, since python functions are objects, the filter would be a regular one:
>
> def StripComments(stream):
> """Strip the comments from a stack"""
> for token_type, value in stream:
> if token_type not in Comment:
> yield token_type, value
>
> Pretty nice, eh? Now filters are just normal functions instead of
> classes :-D Just an important question: until this moment i didn't
> need to use the stack parameter in any filter, why is it being used
> for?

Good catch! I'm not sure ATM. IIRC the intention of the stack parameter
was actually to carry a state around, just like the dictionary you've
mentioned above. But I didn't found a use for it later.

>
> I've attached my current pipeline code with a trivial working example,
> i hope i could be able to finish it at my project code today. It's
> using the functors system so it's able to use functions or objects
> with the __call__() method, but it's the only difference with current
> filters and it's debatable to modify the pipeline to use filters as
> they are at this moment (classes with process method).

Your example looks nice. I like how the combination of functions/filters
is simplified!

-Andi

pir...@gmail.com

unread,
Aug 24, 2011, 9:09:13 AM8/24/11
to sqlp...@googlegroups.com
> Adding a __call__ method to filters looks good.
>
I like you liked it :-)


> Good catch! I'm not sure ATM. IIRC the intention of the stack parameter
> was actually to carry a state around, just like the dictionary you've
> mentioned above. But I didn't found a use for it later.
>

Ok, this could have a solution maybe returning a tuple with the
filtered stream and the new state on the filters that modified it,
since in fact the returned type doesn't need to be the same one that
the input one and also a modified stream is a state by itself. Another
option would be to return it on all filters, or acess and modify it as
a parameter so it's directly modified. I think this is an important
point because it would suppose a modification on the API in the same
way to use callable objects, but also a stabilization and a
future-proof concept.

Another option is don't try to solve problems that don't exists at
this moment and wait until someone needs to pass a state value and the
modified stream is not enought for him (that i don't think so).


>> I've attached my current pipeline code with a trivial working example,
>> i hope i could be able to finish it at my project code today. It's
>> using the functors system so it's able to use functions or objects
>> with the __call__() method, but it's the only difference with current
>> filters and it's debatable to modify the pipeline to use filters as
>> they are at this moment (classes with process method).
>
> Your example looks nice. I like how the combination of functions/filters
> is simplified!
>

Thanks! :-D It's just a little of Python magic since functions are
first-class objects... :-D That's the reason because i proposed to use
__call__() method, because for Python interpreter a function it's a
callable object, it's say, a class with just one method called
__call__(), and since so much filters only had a process() method that
got me the idea to use functions directly :-)

I have just uploaded my new code, i was waiting for your opinion about
process() vs. __call__() style and also to have it working in my
projects in a stabilized way. You can check for it but don't worry
about things like split2 and so, they are just to work with the
__call__() style and don't touch the original ones until something is
defined about it :-P

In any case, i think we have to decide about two things:

1º process() method vs. callable objects (__call__() and functions)
style. Both can be used at the same time for retrocompatibility as i
show you in the examples, but i think callable objects should be
preferable for new filters and set process() as deprecated at the
future.

2º pipeline API. At this moment it's the same that a filter, and could
be used as another one (it's sort of meta-filter), but we should
discuss about the state passing and so.

What do you think?

Andi Albrecht

unread,
Sep 1, 2011, 5:06:37 AM9/1/11
to sqlp...@googlegroups.com
"pir...@gmail.com" <pir...@gmail.com> writes:

> I have just finished the clean-up and uploaded my code so you can
> review it and tell me your comments about it... Main beneficts are
> that code is simple and more modular since the pipeline instance is
> "dumb", and also it let to reuse some procesed fragments of the
> pipeline and also use it as another filter since it use the same
> interface: get streamed input data and output data, althought is not
> directly usable as a current format filter, but this would be easily
> addapted using callables...

Sorry for the late reply. Would you mind to upload your changes to
http://codereview.appspot.com as described here:
http://sqlparse.readthedocs.org/en/latest/intro/#development-contributing
?

-Andi

pir...@gmail.com

unread,
Sep 1, 2011, 5:08:49 AM9/1/11
to sqlp...@googlegroups.com
I have it at my clone, but i don't have problems to upload there :-)

2011/9/1 Andi Albrecht <albrec...@googlemail.com>:

> --
> You received this message because you are subscribed to the Google Groups "sqlparse" group.
> To post to this group, send email to sqlp...@googlegroups.com.
> To unsubscribe from this group, send email to sqlparse+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>
>

--

Andi Albrecht

unread,
Sep 1, 2011, 5:10:13 AM9/1/11
to sqlp...@googlegroups.com
On Thu, Sep 1, 2011 at 11:08 AM, pir...@gmail.com <pir...@gmail.com> wrote:
> I have it at my clone, but i don't have problems to upload there :-)

Cool, thanks! It's easier to add comments there :)

-Andi

pir...@gmail.com

unread,
Sep 1, 2011, 5:13:14 AM9/1/11
to sqlp...@googlegroups.com
First i have to learn how to use it... :-/ In any case the patch is
http://code.google.com/r/piranna-sqlparse/source/detail?r=8f1e9991d07794a9f4319f15b660235d6b84d481,
and it seems it's posible to left comments there :-)

2011/9/1 Andi Albrecht <albrec...@googlemail.com>:

Andi Albrecht

unread,
Sep 1, 2011, 5:20:08 AM9/1/11
to sqlp...@googlegroups.com
On Thu, Sep 1, 2011 at 11:13 AM, pir...@gmail.com <pir...@gmail.com> wrote:
> First i have to learn how to use it... :-/ In any case the patch is
> http://code.google.com/r/piranna-sqlparse/source/detail?r=8f1e9991d07794a9f4319f15b660235d6b84d481,
> and it seems it's posible to left comments there :-)

I can't add comments there. Using codereview is simple. You just need
to download this script:
http://codereview.appspot.com/static/upload.py

And then from withing your working copy (your clone):

upload.py -y -m "SOME DESCRIPTIVE MESSAGE" -r albrec...@gmail.com
--send_mail --rev 8f1e9991d077

pir...@gmail.com

unread,
Sep 1, 2011, 5:32:03 AM9/1/11
to sqlp...@googlegroups.com
I think it worked, tell me if you have received it.

pir...@gmail.com

unread,
Sep 1, 2011, 5:34:43 AM9/1/11
to sqlp...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages