Hi! :-) I have just found this project looking for a python parser for SQL and i think that's what i needed to parse and compact the queries of my filesystem (http://pirannafs.blogspot.com and https://github.com/piranna/PirannaFS), only that i have to develop my own filters.
What i need are two things: * parse metadata inside some comments * "compact" the query (no comments, no whitespaces, no return lines...)
As i have read, instead of use the sqlparse.format() function i should make my own filters and build by hand my own pipelines, isn't it? There is any filters for the things that i want (i know there is one for stripcomments and another for strip whitespaces, but i don't know if it removes return lines also...)? How i can be able to stract data from the comments?
As a last point, how can i be able to upload the filters i create? Doing a clone of the project and sending a pull request as in GitHub?
Greetings, Jesús Leganés Combarro.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: > Hi! :-) I have just found this project looking for a python parser for > SQL and i think that's what i needed to parse and compact the queries > of my filesystem (http://pirannafs.blogspot.com and > https://github.com/piranna/PirannaFS), only that i have to develop my > own filters.
> What i need are two things: > * parse metadata inside some comments > * "compact" the query (no comments, no whitespaces, no return lines...)
> As i have read, instead of use the sqlparse.format() function i should > make my own filters and build by hand my own pipelines, isn't it? > There is any filters for the things that i want (i know there is one > for stripcomments and another for strip whitespaces, but i don't know > if it removes return lines also...)? How i can be able to stract data > from the comments?
Yes, you're right. Both can be achieved by implementing your own filters.
ATM there's no official API on how to build your own pipelines - so, you've been warned, this may change :)
But have a look at sqlparse.formatter.build_filter_stack(). This functions builds a pipeline of filters depending on give formatting options. I think you'll need to build your own filter stack using either filters distributed with sqlparse or your custom filters in a very similar way. Note that there are three stages: token filters, filters that work on grouped items and serializers. Since you've already encountered the stripcomments and strip whitespaces filters, you should be aware of the different signatures.
> As a last point, how can i be able to upload the filters i create? > Doing a clone of the project and sending a pull request as in GitHub?
If your filters are of general interest, I'd be happy to add them. "Compacting" a query could be useful in some cases (when pushing a SQL statement over the network for example). But I'm not sure if there's general interest in a filter that parses metadata in comments.
The project is hosted in a Mercurial repository on Google Code. IIRC they have a "request code review" feature somewhere on Google Code after you've cloned the repository there. But you can send me a pull request from anywhere where your clone is publicity available. Good ol' patches are fine too :)
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> Yes, you're right. Both can be achieved by implementing your own > filters.
Good to know :-) The only thing is the metadata parser, i'll have to look if i get a solution to be able to get them later (maybe a static attribute?)
> ATM there's no official API on how to build your own pipelines - so, > you've been warned, this may change :)
It's not the first library that i use in alpha stage... :-P
> But have a look at sqlparse.formatter.build_filter_stack(). This > functions builds a pipeline of filters depending on give formatting > options.
In fact it's from where i take the concept of pipeline, the parser structure is similar to the GStreamer (another toy i always want to have in my toys box ;-) ) pipelines structure... :-)
> Note that there are three stages: token filters, filters > that work on grouped items and serializers. Since you've already > encountered the stripcomments and strip whitespaces filters, you should > be aware of the different signatures.
Thank you! :-D Really i only take a fast read of the code to understand how it works, really i didn't notice the different signatures... :-P Yes, now re-thinking about it it's true there are three APIs, one for the tokenizer, another for the filters and another for the output... really intelligent... :-)
> If your filters are of general interest, I'd be happy to add > them. "Compacting" a query could be useful in some cases (when pushing a > SQL statement over the network for example).
I wanted to store the queries ready to be launched to the database, but yes, that's the idea.
> But I'm not sure if there's > general interest in a filter that parses metadata in comments.
Obviusly... at least it's a no so common feature... :-P Another one that i think i would need it's INCLUDE one (SQLite doesn't support it), but i don't know if it's better as a metadata or as a statement, appart that i need to solve other issues before regarding to the rowindex if i want the INCLUDE be useful...
> The project is hosted in a Mercurial repository on Google Code. IIRC > they have a "request code review" feature somewhere on Google Code after > you've cloned the repository there.
Ok, i'll make my changes and later i'll port it to the clone if i see i need to modify core code.
> But you can send me a pull request > from anywhere where your clone is publicity available. Good ol' patches > are fine too :)
Lol :-P
Thank you so much Andi! :-D I'll told later how i'm going with it :-)
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: >> Yes, you're right. Both can be achieved by implementing your own >> filters.
> Good to know :-) The only thing is the metadata parser, i'll have to > look if i get a solution to be able to get them later (maybe a static > attribute?)
>> ATM there's no official API on how to build your own pipelines - so, >> you've been warned, this may change :)
> It's not the first library that i use in alpha stage... :-P
>> But have a look at sqlparse.formatter.build_filter_stack(). This >> functions builds a pipeline of filters depending on give formatting >> options.
> In fact it's from where i take the concept of pipeline, the parser > structure is similar to the GStreamer (another toy i always want to > have in my toys box ;-) ) pipelines structure... :-)
>> Note that there are three stages: token filters, filters >> that work on grouped items and serializers. Since you've already >> encountered the stripcomments and strip whitespaces filters, you should >> be aware of the different signatures.
> Thank you! :-D Really i only take a fast read of the code to > understand how it works, really i didn't notice the different > signatures... :-P Yes, now re-thinking about it it's true there are > three APIs, one for the tokenizer, another for the filters and another > for the output... really intelligent... :-)
The idea behind this is to do as little parsing work as needed. If you only want to change the keywords or identifiers to upper case then only filters on tokenizer level are added for example. The more expensive grouping didn't happen then.
>> If your filters are of general interest, I'd be happy to add >> them. "Compacting" a query could be useful in some cases (when pushing a >> SQL statement over the network for example).
> I wanted to store the queries ready to be launched to the database, > but yes, that's the idea.
>> But I'm not sure if there's >> general interest in a filter that parses metadata in comments.
> Obviusly... at least it's a no so common feature... :-P Another one > that i think i would need it's INCLUDE one (SQLite doesn't support > it), but i don't know if it's better as a metadata or as a statement, > appart that i need to solve other issues before regarding to the > rowindex if i want the INCLUDE be useful...
>> The project is hosted in a Mercurial repository on Google Code. IIRC >> they have a "request code review" feature somewhere on Google Code after >> you've cloned the repository there.
> Ok, i'll make my changes and later i'll port it to the clone if i see > i need to modify core code.
>> But you can send me a pull request >> from anywhere where your clone is publicity available. Good ol' patches >> are fine too :)
> Lol :-P
> Thank you so much Andi! :-D I'll told later how i'm going with it :-)
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
Hi Andi! Sorry for not writting to you before, i was a little bit busy.
I have been working with your library and got some results like a filter to implement a INCLUDE statement (a filter to include the code from inside another file, just like include pragma at C code):
class IncludeStatement(Filter): """Filter that enable a INCLUDE statement to """
def process(self, stack, stream): # Run over all tokens in the stream for token_type, value in stream: # INCLUDE statement found, set detected mode if token_type in Name and value.upper() == 'INCLUDE': self.detected = True continue
# INCLUDE statement was found, parse it elif self.detected: # Omit whitespaces if token_type in Whitespace: pass
# Get path of file to include path = None
# if token_type in Name: # path = join(self.dirpath, value) if token_type in tokens.String.Symbol: path = join(self.dirpath, value[1:-1])
# Include file if path was found if path: try: with open(path) as f: sql = f.read()
except IOError: print path, "file doesn't exists"
else: # Create new FilterStack to parse readed file # and add all its tokens to the main stack recursively # [ToDo] Add maximum recursive iteration value stack = FilterStack() stack.preprocess.append(IncludeStatement(self.dirpath))
for tv in stack.run(sql): yield tv
# Set normal mode self.detected = False
# Don't include any token while in detected mode continue
# Normal token yield token_type, value
But the fact is that it was difficult for diferent reasons, i hope they would be useful to you:
* i didn't know what clases to use to check the tokens. Looking at the code i found some definitions at tokens.py, but they were done in a strange way (i didn't see this before) and also i wasn't be able to found the classes code, being something like magin for me. More that this, this didn't allow me to import just the clases i wanted because they were not found, i needed to import all the full module (it's slower at runtime to use). More that this, i needed to update the KEYWORD list, what was impossible... :-(
* Also, they are not all of them defined, so looking at the parse stack i found a Symbol class one for a quoted string and needed to search over all the code where it was defined (i found it inside tokens.String.Symbol)
* another thing is that having three diferent kind of filters with different signs and return values (or yield) gets confused. At a previous work they were using a pipeline system with diferent stages were the data was stored inside a dict that was being modified at every stage and when the stage finished it was pased to the next one by the pipeline.
* another thing (i believe i have read something similar at the mail list) it's about that SELECT stament columns are not grouped, what i think is something really important (since they are used usually as a full)... and also it's something i would need by myself :-D More that this, also the KeywordCaseFilter identified a column named 'size' as a SQL keyword so it capitalized it, what would break the query... :-/
* finally, i found example filters overcomplicated. I did a StripComments class before knowing there was one inside filters.py and it's working perfectly being so much smaller...
class StripComments(Filter): def process(self, stack, stream): """Process the token stream.""" for token_type, value in stream: if token_type not in Comment: yield token_type, value
class StripCommentsFilter(Filter):
def _process(self, tlist): clss = set([x.__class__ for x in tlist.tokens]) while sql.Comment in clss: token = tlist.token_next_by_instance(0, sql.Comment) tidx = tlist.token_index(token) prev = tlist.token_prev(tidx, False) next_ = tlist.token_next(tidx, False) # Replace by whitespace if prev and next exist and if they're not # whitespaces. This doesn't apply if prev or next is a paranthesis. if (prev is not None and next_ is not None and not prev.is_whitespace() and not next_.is_whitespace() and not (prev.match(T.Punctuation, '(') or next_.match(T.Punctuation, ')'))): tlist.tokens[tidx] = sql.Token(T.Whitespace, ' ') else: tlist.tokens.pop(tidx) clss = set([x.__class__ for x in tlist.tokens])
def process(self, stack, stmt): [self.process(stack, sgroup) for sgroup in stmt.get_sublists()] self._process(stmt)
In any case, i know it's just an alpha stage, so i know it can change... :-P These are only my perceptions, i think they would be useful. By the way, i think that the library is interesting and i would like to help you with it if possible :-)
>>> Note that there are three stages: token filters, filters >>> that work on grouped items and serializers. Since you've already >>> encountered the stripcomments and strip whitespaces filters, you should >>> be aware of the different signatures.
>> Thank you! :-D Really i only take a fast read of the code to >> understand how it works, really i didn't notice the different >> signatures... :-P Yes, now re-thinking about it it's true there are >> three APIs, one for the tokenizer, another for the filters and another >> for the output... really intelligent... :-)
> The idea behind this is to do as little parsing work as needed. If you > only want to change the keywords or identifiers to upper case then only > filters on tokenizer level are added for example. The more expensive > grouping didn't happen then.
So it's done to be the parser faster, isn't it?
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: > Hi Andi! Sorry for not writting to you before, i was a little bit busy.
> I have been working with your library and got some results like a > filter to implement a INCLUDE statement (a filter to include the code > from inside another file, just like include pragma at C code):
> class IncludeStatement(Filter): > """Filter that enable a INCLUDE statement to """
> def process(self, stack, stream): > # Run over all tokens in the stream > for token_type, value in stream: > # INCLUDE statement found, set detected mode > if token_type in Name and value.upper() == 'INCLUDE': > self.detected = True > continue
> # INCLUDE statement was found, parse it > elif self.detected: > # Omit whitespaces > if token_type in Whitespace: > pass
> # Get path of file to include > path = None
> # if token_type in Name: > # path = join(self.dirpath, value) > if token_type in tokens.String.Symbol: > path = join(self.dirpath, value[1:-1])
> # Include file if path was found > if path: > try: > with open(path) as f: > sql = f.read()
> else: > # Create new FilterStack to parse readed file > # and add all its tokens to the main stack recursively > # [ToDo] Add maximum recursive iteration value > stack = FilterStack() > stack.preprocess.append(IncludeStatement(self.dirpath))
> for tv in stack.run(sql): > yield tv
> # Set normal mode > self.detected = False
> # Don't include any token while in detected mode > continue
> # Normal token > yield token_type, value
> But the fact is that it was difficult for diferent reasons, i hope > they would be useful to you:
> * i didn't know what clases to use to check the tokens. Looking at the > code i found some definitions at tokens.py, but they were done in a > strange way (i didn't see this before) and also i wasn't be able to > found the classes code, being something like magin for me. More that > this, this didn't allow me to import just the clases i wanted because > they were not found, i needed to import all the full module (it's > slower at runtime to use). More that this, i needed to update the > KEYWORD list, what was impossible... :-(
Good points! Thanks for pointing this out!
> * Also, they are not all of them defined, so looking at the parse > stack i found a Symbol class one for a quoted string and needed to > search over all the code where it was defined (i found it inside > tokens.String.Symbol)
You're right. The definitions need to be cleaned up.
> * another thing is that having three diferent kind of filters with > different signs and return values (or yield) gets confused. At a > previous work they were using a pipeline system with diferent stages > were the data was stored inside a dict that was being modified at > every stage and when the stage finished it was pased to the next one > by the pipeline.
> * another thing (i believe i have read something similar at the mail > list) it's about that SELECT stament columns are not grouped, what i > think is something really important (since they are used usually as a > full)... and also it's something i would need by myself :-D More that > this, also the KeywordCaseFilter identified a column named 'size' as a > SQL keyword so it capitalized it, what would break the query... :-/
ok, that's a bug (size). However, the column grouping needs some work though.
> * finally, i found example filters overcomplicated. I did a > StripComments class before knowing there was one inside filters.py and > it's working perfectly being so much smaller...
> class StripComments(Filter): > def process(self, stack, stream): > """Process the token stream.""" > for token_type, value in stream: > if token_type not in Comment: > yield token_type, value
> class StripCommentsFilter(Filter):
> def _process(self, tlist): > clss = set([x.__class__ for x in tlist.tokens]) > while sql.Comment in clss: > token = tlist.token_next_by_instance(0, sql.Comment) > tidx = tlist.token_index(token) > prev = tlist.token_prev(tidx, False) > next_ = tlist.token_next(tidx, False) > # Replace by whitespace if prev and next exist and if they're not > # whitespaces. This doesn't apply if prev or next is a paranthesis. > if (prev is not None and next_ is not None > and not prev.is_whitespace() and not next_.is_whitespace() > and not (prev.match(T.Punctuation, '(') > or next_.match(T.Punctuation, ')'))): > tlist.tokens[tidx] = sql.Token(T.Whitespace, ' ') > else: > tlist.tokens.pop(tidx) > clss = set([x.__class__ for x in tlist.tokens])
> def process(self, stack, stmt): > [self.process(stack, sgroup) for sgroup in stmt.get_sublists()] > self._process(stmt)
Nice! I'll have a look at it :)
> In any case, i know it's just an alpha stage, so i know it can > change... :-P These are only my perceptions, i think they would be > useful. By the way, i think that the library is interesting and i > would like to help you with it if possible :-)
Sure!
Thanks very much for your feedback! I'll try to turn some of your points into TODOs.
>>>> Note that there are three stages: token filters, filters >>>> that work on grouped items and serializers. Since you've already >>>> encountered the stripcomments and strip whitespaces filters, you should >>>> be aware of the different signatures.
>>> Thank you! :-D Really i only take a fast read of the code to >>> understand how it works, really i didn't notice the different >>> signatures... :-P Yes, now re-thinking about it it's true there are >>> three APIs, one for the tokenizer, another for the filters and another >>> for the output... really intelligent... :-)
>> The idea behind this is to do as little parsing work as needed. If you >> only want to change the keywords or identifiers to upper case then only >> filters on tokenizer level are added for example. The more expensive >> grouping didn't happen then.
> So it's done to be the parser faster, isn't it?
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> Thanks very much for your feedback! I'll try to turn some of your points > into TODOs.
Welcome :-) If you need a betatester or another point of view regarding to any aspect of the parser or someone that can help you call me ;-)
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
Hi Andi, i have been finishing my SELECT columns filter and adding them to a clone of the project so you can check and pull them. I have seen two more things:
* StripWhiteSpaceFilter is not fully working, leaving blank spaces afters commas (','), dot-and-commas (';') and parenthesis, and sometimes duplicate spaces. I would modify it but i don't end to understand how it works, so i'll try to do my own filter to see if i get better results. In any case i've attached to you my test file so you can be able to check them * Viewing the structure of the filters and being only called from outside only the process method, i think it would be cleaner to convert them to functors, using __call__() instead of process()
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: > Hi Andi, i have been finishing my SELECT columns filter and adding > them to a clone of the project so you can check and pull them. I have > seen two more things:
> * StripWhiteSpaceFilter is not fully working, leaving blank spaces > afters commas (','), dot-and-commas (';') and parenthesis, and > sometimes duplicate spaces. I would modify it but i don't end to > understand how it works, so i'll try to do my own filter to see if i > get better results. In any case i've attached to you my test file so > you can be able to check them > * Viewing the structure of the filters and being only called from > outside only the process method, i think it would be cleaner to > convert them to functors, using __call__() instead of process()
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
>> Hi Andi, i have been finishing my SELECT columns filter and adding >> them to a clone of the project so you can check and pull them. I have >> seen two more things:
>> * StripWhiteSpaceFilter is not fully working, leaving blank spaces >> afters commas (','), dot-and-commas (';') and parenthesis, and >> sometimes duplicate spaces. I would modify it but i don't end to >> understand how it works, so i'll try to do my own filter to see if i >> get better results. In any case i've attached to you my test file so >> you can be able to check them >> * Viewing the structure of the filters and being only called from >> outside only the process method, i think it would be cleaner to >> convert them to functors, using __call__() instead of process()
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
On Fri, Aug 12, 2011 at 5:25 PM, pira...@gmail.com <pira...@gmail.com> wrote: > Great! :-D I have been reviewing the clean ups and are really nice, > i'll update my code to reference to it :-)
Have you seen the change regarding the with statement? That's something the buildbot brought up. ATM sqlparse is compatible even with Python 2.4. To be honest, I'm not sure how useful that is at all. And I'm sure that at some point it'd make sense to stop supporting 2.4 (or even 2.5). But if it's just a with statement replaced by the more "traditional" way it's ok to keep this compatibility up IMO. :)
But don't worry, almost always I leave it up to the buildbot to test Python 2.4 compatibility.
> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >> "pira...@gmail.com" <pira...@gmail.com> writes:
>>> Hi Andi, i have been finishing my SELECT columns filter and adding >>> them to a clone of the project so you can check and pull them. I have >>> seen two more things:
>>> * StripWhiteSpaceFilter is not fully working, leaving blank spaces >>> afters commas (','), dot-and-commas (';') and parenthesis, and >>> sometimes duplicate spaces. I would modify it but i don't end to >>> understand how it works, so i'll try to do my own filter to see if i >>> get better results. In any case i've attached to you my test file so >>> you can be able to check them >>> * Viewing the structure of the filters and being only called from >>> outside only the process method, i think it would be cleaner to >>> convert them to functors, using __call__() instead of process()
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
Yeah, i saw the with statement commit. At the same work of the pipelines they were still using Python 2.4 in production. I think it's so much outdated, but i agree with you: if it's just a matter of change the with statement with the old-fashion style i don't see any problem besides it's uglier... :-P
2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>:
> On Fri, Aug 12, 2011 at 5:25 PM, pira...@gmail.com <pira...@gmail.com> wrote: >> Great! :-D I have been reviewing the clean ups and are really nice, >> i'll update my code to reference to it :-)
> Have you seen the change regarding the with statement? That's > something the buildbot brought up. ATM sqlparse is compatible even > with Python 2.4. To be honest, I'm not sure how useful that is at all. > And I'm sure that at some point it'd make sense to stop supporting 2.4 > (or even 2.5). But if it's just a with statement replaced by the more > "traditional" way it's ok to keep this compatibility up IMO. :)
> But don't worry, almost always I leave it up to the buildbot to test > Python 2.4 compatibility.
> -Andi
>> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>> Hi Andi, i have been finishing my SELECT columns filter and adding >>>> them to a clone of the project so you can check and pull them. I have >>>> seen two more things:
>>>> * StripWhiteSpaceFilter is not fully working, leaving blank spaces >>>> afters commas (','), dot-and-commas (';') and parenthesis, and >>>> sometimes duplicate spaces. I would modify it but i don't end to >>>> understand how it works, so i'll try to do my own filter to see if i >>>> get better results. In any case i've attached to you my test file so >>>> you can be able to check them >>>> * Viewing the structure of the filters and being only called from >>>> outside only the process method, i think it would be cleaner to >>>> convert them to functors, using __call__() instead of process()
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
Hi Andi, while using SQLParse at my project (https://github.com/piranna/PirannaFS) i have seen that one of my filters, Columns, needed some review to return the names of the columns instead of the tokens, since we are only interested on the names and not on the types since all of them are in fact of type tokens.Name :-P I have uploaded it to my repo so you can get it.
Besides that, i have seen that my code have get very slower. SQLParse is only used at initialization (once the SQL queries are parsed, i don't use it anymore) but since i'm doing a lot of check of characteristics of the SQL strings my unit test time have increased x4, being this just for SQLParse :-/ I have been looking at it and it seems is because how FilterStack is implemented. At this moment, you first the stack and later you exec it, but there's no way to make a partial process and later use this as basis for the other queries. I think that just being able to use the same tokens tree once it's created for all the different checks i'm doing would get almost of the time down, so i was thinking to develop a pipeline system similar in concept to how GStreamer works, where data goes in and data goes out, being able this way to have intermediate data available to later filters, but maybe this way the API would be incompatible (i hope don't). Do you think this is the right way to do it? There's another way to get what i want to do?
> Yeah, i saw the with statement commit. At the same work of the > pipelines they were still using Python 2.4 in production. I think it's > so much outdated, but i agree with you: if it's just a matter of > change the with statement with the old-fashion style i don't see any > problem besides it's uglier... :-P
> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >> On Fri, Aug 12, 2011 at 5:25 PM, pira...@gmail.com <pira...@gmail.com> wrote: >>> Great! :-D I have been reviewing the clean ups and are really nice, >>> i'll update my code to reference to it :-)
>> Have you seen the change regarding the with statement? That's >> something the buildbot brought up. ATM sqlparse is compatible even >> with Python 2.4. To be honest, I'm not sure how useful that is at all. >> And I'm sure that at some point it'd make sense to stop supporting 2.4 >> (or even 2.5). But if it's just a with statement replaced by the more >> "traditional" way it's ok to keep this compatibility up IMO. :)
>> But don't worry, almost always I leave it up to the buildbot to test >> Python 2.4 compatibility.
>> -Andi
>>> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >>>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>>> Hi Andi, i have been finishing my SELECT columns filter and adding >>>>> them to a clone of the project so you can check and pull them. I have >>>>> seen two more things:
>>>>> * StripWhiteSpaceFilter is not fully working, leaving blank spaces >>>>> afters commas (','), dot-and-commas (';') and parenthesis, and >>>>> sometimes duplicate spaces. I would modify it but i don't end to >>>>> understand how it works, so i'll try to do my own filter to see if i >>>>> get better results. In any case i've attached to you my test file so >>>>> you can be able to check them >>>>> * Viewing the structure of the filters and being only called from >>>>> outside only the process method, i think it would be cleaner to >>>>> convert them to functors, using __call__() instead of process()
>>>>> -- >>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>> Unix." >>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>> To post to this group, send email to sqlparse@googlegroups.com. >>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: > Hi Andi, while using SQLParse at my project > (https://github.com/piranna/PirannaFS) i have seen that one of my > filters, Columns, needed some review to return the names of the > columns instead of the tokens, since we are only interested on the > names and not on the types since all of them are in fact of type > tokens.Name :-P I have uploaded it to my repo so you can get it.
Filters at this point should return 2-tuples (type, value). Your change breaks with this. Wouldn't it be easier to build a custom filter stack with an additional filter that returns the columns then?
> Besides that, i have seen that my code have get very slower. SQLParse > is only used at initialization (once the SQL queries are parsed, i > don't use it anymore) but since i'm doing a lot of check of > characteristics of the SQL strings my unit test time have increased > x4, being this just for SQLParse :-/ I have been looking at it and it
> seems is because how FilterStack is implemented. At this moment, you > first the stack and later you exec it, but there's no way to make a > partial process and later use this as basis for the other queries. I > think that just being able to use the same tokens tree once it's > created for all the different checks i'm doing would get almost of the > time down, so i was thinking to develop a pipeline system similar in > concept to how GStreamer works, where data goes in and data goes out, > being able this way to have intermediate data available to later > filters, but maybe this way the API would be incompatible (i hope > don't). Do you think this is the right way to do it? There's another > way to get what i want to do?
I don't know which concept GStreamer uses for there pipelines. IIRC you've mentioned a dictionary store a state during all transformations. Is it that what you're thinking of?
> 2011/8/12 pira...@gmail.com <pira...@gmail.com>: >> Yeah, i saw the with statement commit. At the same work of the >> pipelines they were still using Python 2.4 in production. I think it's >> so much outdated, but i agree with you: if it's just a matter of >> change the with statement with the old-fashion style i don't see any >> problem besides it's uglier... :-P
>> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >>> On Fri, Aug 12, 2011 at 5:25 PM, pira...@gmail.com <pira...@gmail.com> wrote: >>>> Great! :-D I have been reviewing the clean ups and are really nice, >>>> i'll update my code to reference to it :-)
>>> Have you seen the change regarding the with statement? That's >>> something the buildbot brought up. ATM sqlparse is compatible even >>> with Python 2.4. To be honest, I'm not sure how useful that is at all. >>> And I'm sure that at some point it'd make sense to stop supporting 2.4 >>> (or even 2.5). But if it's just a with statement replaced by the more >>> "traditional" way it's ok to keep this compatibility up IMO. :)
>>> But don't worry, almost always I leave it up to the buildbot to test >>> Python 2.4 compatibility.
>>> -Andi
>>>> 2011/8/12 Andi Albrecht <albrecht.a...@googlemail.com>: >>>>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>>>> Hi Andi, i have been finishing my SELECT columns filter and adding >>>>>> them to a clone of the project so you can check and pull them. I have >>>>>> seen two more things:
>>>>>> * StripWhiteSpaceFilter is not fully working, leaving blank spaces >>>>>> afters commas (','), dot-and-commas (';') and parenthesis, and >>>>>> sometimes duplicate spaces. I would modify it but i don't end to >>>>>> understand how it works, so i'll try to do my own filter to see if i >>>>>> get better results. In any case i've attached to you my test file so >>>>>> you can be able to check them >>>>>> * Viewing the structure of the filters and being only called from >>>>>> outside only the process method, i think it would be cleaner to >>>>>> convert them to functors, using __call__() instead of process()
>>>>>> -- >>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>> Unix." >>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>> -- >>>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>>> To post to this group, send email to sqlparse@googlegroups.com. >>>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>> To post to this group, send email to sqlparse@googlegroups.com. >>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> Filters at this point should return 2-tuples (type, value). Your change > breaks with this. Wouldn't it be easier to build a custom filter stack > with an additional filter that returns the columns then?
It's true, maybe it's easier, or at least it's coherent with the current code basis. In any case i'm doing a proof of concept for the pipeline system and it seems promising to solve this and some performance problems.
I'm triying to see it, but Google Code server is having problems at this moment... :-/ I'll try it again later.
> I don't know which concept GStreamer uses for there pipelines. IIRC > you've mentioned a dictionary store a state during all > transformations. Is it that what you're thinking of?
At GStreamer, data flow thought plugins (filters) from one plugin source (output) to next plugin sink (input), just like a water pipeline, but only if both source and sink understand the same caps (data format). Another solution is just like my previous work, where every stage got a temporal dict as parameter with it's data and it was being modified thought every stage, just like a car asembly line.
My pipeline proof of concept is more similar to GStreamer one, because i think is more elegant and also using generators it could be more efficient. First i have a pipeline class, that's basically a python list where i store the filters, and then i use the output of the filter (the generator) as the stream source for the next one. More than this, when executing the pipeline it has the same method sign that filters, so you can pass an input stream and it's given to the first filter. Finally, the processed data is stabilized and returned. The mayor modification is that i'm using callable objects (standar functions and objects with __call__() method) instead of regular filters, so they can be used in any fashion way and the stream can change it's type dinamically (string, array... whatever), and also it lets to use filters indepently, because just using callable objects they can be able to be called outside a pipeline.
Here you have an example of a callable filter:
class StripComments(Filter): """Strip the comments from a stack""" def process(self, stack, stream): for token_type, value in stream: if token_type not in Comment: yield token_type, value
As you can see, this way it could be used outside a pipeline this way:
filter = StripComments() filter(stream)
But also, since python functions are objects, the filter would be a regular one:
def StripComments(stream): """Strip the comments from a stack""" for token_type, value in stream: if token_type not in Comment: yield token_type, value
Pretty nice, eh? Now filters are just normal functions instead of classes :-D Just an important question: until this moment i didn't need to use the stack parameter in any filter, why is it being used for?
I've attached my current pipeline code with a trivial working example, i hope i could be able to finish it at my project code today. It's using the functors system so it's able to use functions or objects with the __call__() method, but it's the only difference with current filters and it's debatable to modify the pipeline to use filters as they are at this moment (classes with process method).
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
Ok, finished conversion of my project code... and worked :-) Unit test time have lowered with my pipeline system from more than 4 seconds to just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's a great improvement :-D I'm going to do some clean-ups previously to send some pathches :-)
>> Filters at this point should return 2-tuples (type, value). Your change >> breaks with this. Wouldn't it be easier to build a custom filter stack >> with an additional filter that returns the columns then?
> It's true, maybe it's easier, or at least it's coherent with the > current code basis. In any case i'm doing a proof of concept for the > pipeline system and it seems promising to solve this and some > performance problems.
> I'm triying to see it, but Google Code server is having problems at > this moment... :-/ I'll try it again later.
>> I don't know which concept GStreamer uses for there pipelines. IIRC >> you've mentioned a dictionary store a state during all >> transformations. Is it that what you're thinking of?
> At GStreamer, data flow thought plugins (filters) from one plugin > source (output) to next plugin sink (input), just like a water > pipeline, but only if both source and sink understand the same caps > (data format). Another solution is just like my previous work, where > every stage got a temporal dict as parameter with it's data and it was > being modified thought every stage, just like a car asembly line.
> My pipeline proof of concept is more similar to GStreamer one, because > i think is more elegant and also using generators it could be more > efficient. First i have a pipeline class, that's basically a python > list where i store the filters, and then i use the output of the > filter (the generator) as the stream source for the next one. More > than this, when executing the pipeline it has the same method sign > that filters, so you can pass an input stream and it's given to the > first filter. Finally, the processed data is stabilized and returned. > The mayor modification is that i'm using callable objects (standar > functions and objects with __call__() method) instead of regular > filters, so they can be used in any fashion way and the stream can > change it's type dinamically (string, array... whatever), and also it > lets to use filters indepently, because just using callable objects > they can be able to be called outside a pipeline.
> Here you have an example of a callable filter:
> class StripComments(Filter): > """Strip the comments from a stack""" > def process(self, stack, stream): > for token_type, value in stream: > if token_type not in Comment: > yield token_type, value
> As you can see, this way it could be used outside a pipeline this way:
> filter = StripComments() > filter(stream)
> But also, since python functions are objects, the filter would be a regular one:
> def StripComments(stream): > """Strip the comments from a stack""" > for token_type, value in stream: > if token_type not in Comment: > yield token_type, value
> Pretty nice, eh? Now filters are just normal functions instead of > classes :-D Just an important question: until this moment i didn't > need to use the stack parameter in any filter, why is it being used > for?
> I've attached my current pipeline code with a trivial working example, > i hope i could be able to finish it at my project code today. It's > using the functors system so it's able to use functions or objects > with the __call__() method, but it's the only difference with current > filters and it's debatable to modify the pipeline to use filters as > they are at this moment (classes with process method).
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
I have just finished the clean-up and uploaded my code so you can review it and tell me your comments about it... Main beneficts are that code is simple and more modular since the pipeline instance is "dumb", and also it let to reuse some procesed fragments of the pipeline and also use it as another filter since it use the same interface: get streamed input data and output data, althought is not directly usable as a current format filter, but this would be easily addapted using callables...
> Ok, finished conversion of my project code... and worked :-) Unit test > time have lowered with my pipeline system from more than 4 seconds to > just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's > a great improvement :-D I'm going to do some clean-ups previously to > send some pathches :-)
> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>> Filters at this point should return 2-tuples (type, value). Your change >>> breaks with this. Wouldn't it be easier to build a custom filter stack >>> with an additional filter that returns the columns then?
>> It's true, maybe it's easier, or at least it's coherent with the >> current code basis. In any case i'm doing a proof of concept for the >> pipeline system and it seems promising to solve this and some >> performance problems.
>> I'm triying to see it, but Google Code server is having problems at >> this moment... :-/ I'll try it again later.
>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>> you've mentioned a dictionary store a state during all >>> transformations. Is it that what you're thinking of?
>> At GStreamer, data flow thought plugins (filters) from one plugin >> source (output) to next plugin sink (input), just like a water >> pipeline, but only if both source and sink understand the same caps >> (data format). Another solution is just like my previous work, where >> every stage got a temporal dict as parameter with it's data and it was >> being modified thought every stage, just like a car asembly line.
>> My pipeline proof of concept is more similar to GStreamer one, because >> i think is more elegant and also using generators it could be more >> efficient. First i have a pipeline class, that's basically a python >> list where i store the filters, and then i use the output of the >> filter (the generator) as the stream source for the next one. More >> than this, when executing the pipeline it has the same method sign >> that filters, so you can pass an input stream and it's given to the >> first filter. Finally, the processed data is stabilized and returned. >> The mayor modification is that i'm using callable objects (standar >> functions and objects with __call__() method) instead of regular >> filters, so they can be used in any fashion way and the stream can >> change it's type dinamically (string, array... whatever), and also it >> lets to use filters indepently, because just using callable objects >> they can be able to be called outside a pipeline.
>> Here you have an example of a callable filter:
>> class StripComments(Filter): >> """Strip the comments from a stack""" >> def process(self, stack, stream): >> for token_type, value in stream: >> if token_type not in Comment: >> yield token_type, value
>> As you can see, this way it could be used outside a pipeline this way:
>> filter = StripComments() >> filter(stream)
>> But also, since python functions are objects, the filter would be a regular one:
>> def StripComments(stream): >> """Strip the comments from a stack""" >> for token_type, value in stream: >> if token_type not in Comment: >> yield token_type, value
>> Pretty nice, eh? Now filters are just normal functions instead of >> classes :-D Just an important question: until this moment i didn't >> need to use the stack parameter in any filter, why is it being used >> for?
>> I've attached my current pipeline code with a trivial working example, >> i hope i could be able to finish it at my project code today. It's >> using the functors system so it's able to use functions or objects >> with the __call__() method, but it's the only difference with current >> filters and it's debatable to modify the pipeline to use filters as >> they are at this moment (classes with process method).
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: >> Filters at this point should return 2-tuples (type, value). Your change >> breaks with this. Wouldn't it be easier to build a custom filter stack >> with an additional filter that returns the columns then?
> It's true, maybe it's easier, or at least it's coherent with the > current code basis. In any case i'm doing a proof of concept for the > pipeline system and it seems promising to solve this and some > performance problems.
> I'm triying to see it, but Google Code server is having problems at > this moment... :-/ I'll try it again later.
>> I don't know which concept GStreamer uses for there pipelines. IIRC >> you've mentioned a dictionary store a state during all >> transformations. Is it that what you're thinking of?
> At GStreamer, data flow thought plugins (filters) from one plugin > source (output) to next plugin sink (input), just like a water > pipeline, but only if both source and sink understand the same caps > (data format). Another solution is just like my previous work, where > every stage got a temporal dict as parameter with it's data and it was > being modified thought every stage, just like a car asembly line.
> My pipeline proof of concept is more similar to GStreamer one, because > i think is more elegant and also using generators it could be more > efficient. First i have a pipeline class, that's basically a python > list where i store the filters, and then i use the output of the > filter (the generator) as the stream source for the next one. More > than this, when executing the pipeline it has the same method sign > that filters, so you can pass an input stream and it's given to the > first filter. Finally, the processed data is stabilized and returned. > The mayor modification is that i'm using callable objects (standar > functions and objects with __call__() method) instead of regular > filters, so they can be used in any fashion way and the stream can > change it's type dinamically (string, array... whatever), and also it > lets to use filters indepently, because just using callable objects > they can be able to be called outside a pipeline.
> Here you have an example of a callable filter:
> class StripComments(Filter): > """Strip the comments from a stack""" > def process(self, stack, stream): > for token_type, value in stream: > if token_type not in Comment: > yield token_type, value
> As you can see, this way it could be used outside a pipeline this way:
> filter = StripComments() > filter(stream)
Adding a __call__ method to filters looks good.
> But also, since python functions are objects, the filter would be a regular one:
> def StripComments(stream): > """Strip the comments from a stack""" > for token_type, value in stream: > if token_type not in Comment: > yield token_type, value
> Pretty nice, eh? Now filters are just normal functions instead of > classes :-D Just an important question: until this moment i didn't > need to use the stack parameter in any filter, why is it being used > for?
Good catch! I'm not sure ATM. IIRC the intention of the stack parameter was actually to carry a state around, just like the dictionary you've mentioned above. But I didn't found a use for it later.
> I've attached my current pipeline code with a trivial working example, > i hope i could be able to finish it at my project code today. It's > using the functors system so it's able to use functions or objects > with the __call__() method, but it's the only difference with current > filters and it's debatable to modify the pipeline to use filters as > they are at this moment (classes with process method).
Your example looks nice. I like how the combination of functions/filters is simplified!
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> Good catch! I'm not sure ATM. IIRC the intention of the stack parameter > was actually to carry a state around, just like the dictionary you've > mentioned above. But I didn't found a use for it later.
Ok, this could have a solution maybe returning a tuple with the filtered stream and the new state on the filters that modified it, since in fact the returned type doesn't need to be the same one that the input one and also a modified stream is a state by itself. Another option would be to return it on all filters, or acess and modify it as a parameter so it's directly modified. I think this is an important point because it would suppose a modification on the API in the same way to use callable objects, but also a stabilization and a future-proof concept.
Another option is don't try to solve problems that don't exists at this moment and wait until someone needs to pass a state value and the modified stream is not enought for him (that i don't think so).
>> I've attached my current pipeline code with a trivial working example, >> i hope i could be able to finish it at my project code today. It's >> using the functors system so it's able to use functions or objects >> with the __call__() method, but it's the only difference with current >> filters and it's debatable to modify the pipeline to use filters as >> they are at this moment (classes with process method).
> Your example looks nice. I like how the combination of functions/filters > is simplified!
Thanks! :-D It's just a little of Python magic since functions are first-class objects... :-D That's the reason because i proposed to use __call__() method, because for Python interpreter a function it's a callable object, it's say, a class with just one method called __call__(), and since so much filters only had a process() method that got me the idea to use functions directly :-)
I have just uploaded my new code, i was waiting for your opinion about process() vs. __call__() style and also to have it working in my projects in a stabilized way. You can check for it but don't worry about things like split2 and so, they are just to work with the __call__() style and don't touch the original ones until something is defined about it :-P
In any case, i think we have to decide about two things:
1ş process() method vs. callable objects (__call__() and functions) style. Both can be used at the same time for retrocompatibility as i show you in the examples, but i think callable objects should be preferable for new filters and set process() as deprecated at the future.
2ş pipeline API. At this moment it's the same that a filter, and could be used as another one (it's sort of meta-filter), but we should discuss about the state passing and so.
What do you think?
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
"pira...@gmail.com" <pira...@gmail.com> writes: > I have just finished the clean-up and uploaded my code so you can > review it and tell me your comments about it... Main beneficts are > that code is simple and more modular since the pipeline instance is > "dumb", and also it let to reuse some procesed fragments of the > pipeline and also use it as another filter since it use the same > interface: get streamed input data and output data, althought is not > directly usable as a current format filter, but this would be easily > addapted using callables...
> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >> Ok, finished conversion of my project code... and worked :-) Unit test >> time have lowered with my pipeline system from more than 4 seconds to >> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >> a great improvement :-D I'm going to do some clean-ups previously to >> send some pathches :-)
>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>> Filters at this point should return 2-tuples (type, value). Your change >>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>> with an additional filter that returns the columns then?
>>> It's true, maybe it's easier, or at least it's coherent with the >>> current code basis. In any case i'm doing a proof of concept for the >>> pipeline system and it seems promising to solve this and some >>> performance problems.
>>> I'm triying to see it, but Google Code server is having problems at >>> this moment... :-/ I'll try it again later.
>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>> you've mentioned a dictionary store a state during all >>>> transformations. Is it that what you're thinking of?
>>> At GStreamer, data flow thought plugins (filters) from one plugin >>> source (output) to next plugin sink (input), just like a water >>> pipeline, but only if both source and sink understand the same caps >>> (data format). Another solution is just like my previous work, where >>> every stage got a temporal dict as parameter with it's data and it was >>> being modified thought every stage, just like a car asembly line.
>>> My pipeline proof of concept is more similar to GStreamer one, because >>> i think is more elegant and also using generators it could be more >>> efficient. First i have a pipeline class, that's basically a python >>> list where i store the filters, and then i use the output of the >>> filter (the generator) as the stream source for the next one. More >>> than this, when executing the pipeline it has the same method sign >>> that filters, so you can pass an input stream and it's given to the >>> first filter. Finally, the processed data is stabilized and returned. >>> The mayor modification is that i'm using callable objects (standar >>> functions and objects with __call__() method) instead of regular >>> filters, so they can be used in any fashion way and the stream can >>> change it's type dinamically (string, array... whatever), and also it >>> lets to use filters indepently, because just using callable objects >>> they can be able to be called outside a pipeline.
>>> Here you have an example of a callable filter:
>>> class StripComments(Filter): >>> """Strip the comments from a stack""" >>> def process(self, stack, stream): >>> for token_type, value in stream: >>> if token_type not in Comment: >>> yield token_type, value
>>> As you can see, this way it could be used outside a pipeline this way:
>>> filter = StripComments() >>> filter(stream)
>>> But also, since python functions are objects, the filter would be a regular one:
>>> def StripComments(stream): >>> """Strip the comments from a stack""" >>> for token_type, value in stream: >>> if token_type not in Comment: >>> yield token_type, value
>>> Pretty nice, eh? Now filters are just normal functions instead of >>> classes :-D Just an important question: until this moment i didn't >>> need to use the stack parameter in any filter, why is it being used >>> for?
>>> I've attached my current pipeline code with a trivial working example, >>> i hope i could be able to finish it at my project code today. It's >>> using the functors system so it's able to use functions or objects >>> with the __call__() method, but it's the only difference with current >>> filters and it's debatable to modify the pipeline to use filters as >>> they are at this moment (classes with process method).
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
>> I have just finished the clean-up and uploaded my code so you can >> review it and tell me your comments about it... Main beneficts are >> that code is simple and more modular since the pipeline instance is >> "dumb", and also it let to reuse some procesed fragments of the >> pipeline and also use it as another filter since it use the same >> interface: get streamed input data and output data, althought is not >> directly usable as a current format filter, but this would be easily >> addapted using callables...
>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>> Ok, finished conversion of my project code... and worked :-) Unit test >>> time have lowered with my pipeline system from more than 4 seconds to >>> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >>> a great improvement :-D I'm going to do some clean-ups previously to >>> send some pathches :-)
>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>> Filters at this point should return 2-tuples (type, value). Your change >>>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>>> with an additional filter that returns the columns then?
>>>> It's true, maybe it's easier, or at least it's coherent with the >>>> current code basis. In any case i'm doing a proof of concept for the >>>> pipeline system and it seems promising to solve this and some >>>> performance problems.
>>>> I'm triying to see it, but Google Code server is having problems at >>>> this moment... :-/ I'll try it again later.
>>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>>> you've mentioned a dictionary store a state during all >>>>> transformations. Is it that what you're thinking of?
>>>> At GStreamer, data flow thought plugins (filters) from one plugin >>>> source (output) to next plugin sink (input), just like a water >>>> pipeline, but only if both source and sink understand the same caps >>>> (data format). Another solution is just like my previous work, where >>>> every stage got a temporal dict as parameter with it's data and it was >>>> being modified thought every stage, just like a car asembly line.
>>>> My pipeline proof of concept is more similar to GStreamer one, because >>>> i think is more elegant and also using generators it could be more >>>> efficient. First i have a pipeline class, that's basically a python >>>> list where i store the filters, and then i use the output of the >>>> filter (the generator) as the stream source for the next one. More >>>> than this, when executing the pipeline it has the same method sign >>>> that filters, so you can pass an input stream and it's given to the >>>> first filter. Finally, the processed data is stabilized and returned. >>>> The mayor modification is that i'm using callable objects (standar >>>> functions and objects with __call__() method) instead of regular >>>> filters, so they can be used in any fashion way and the stream can >>>> change it's type dinamically (string, array... whatever), and also it >>>> lets to use filters indepently, because just using callable objects >>>> they can be able to be called outside a pipeline.
>>>> Here you have an example of a callable filter:
>>>> class StripComments(Filter): >>>> """Strip the comments from a stack""" >>>> def process(self, stack, stream): >>>> for token_type, value in stream: >>>> if token_type not in Comment: >>>> yield token_type, value
>>>> As you can see, this way it could be used outside a pipeline this way:
>>>> filter = StripComments() >>>> filter(stream)
>>>> But also, since python functions are objects, the filter would be a regular one:
>>>> def StripComments(stream): >>>> """Strip the comments from a stack""" >>>> for token_type, value in stream: >>>> if token_type not in Comment: >>>> yield token_type, value
>>>> Pretty nice, eh? Now filters are just normal functions instead of >>>> classes :-D Just an important question: until this moment i didn't >>>> need to use the stack parameter in any filter, why is it being used >>>> for?
>>>> I've attached my current pipeline code with a trivial working example, >>>> i hope i could be able to finish it at my project code today. It's >>>> using the functors system so it's able to use functions or objects >>>> with the __call__() method, but it's the only difference with current >>>> filters and it's debatable to modify the pipeline to use filters as >>>> they are at this moment (classes with process method).
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >> "pira...@gmail.com" <pira...@gmail.com> writes:
>>> I have just finished the clean-up and uploaded my code so you can >>> review it and tell me your comments about it... Main beneficts are >>> that code is simple and more modular since the pipeline instance is >>> "dumb", and also it let to reuse some procesed fragments of the >>> pipeline and also use it as another filter since it use the same >>> interface: get streamed input data and output data, althought is not >>> directly usable as a current format filter, but this would be easily >>> addapted using callables...
>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>> Ok, finished conversion of my project code... and worked :-) Unit test >>>> time have lowered with my pipeline system from more than 4 seconds to >>>> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >>>> a great improvement :-D I'm going to do some clean-ups previously to >>>> send some pathches :-)
>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>> Filters at this point should return 2-tuples (type, value). Your change >>>>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>>>> with an additional filter that returns the columns then?
>>>>> It's true, maybe it's easier, or at least it's coherent with the >>>>> current code basis. In any case i'm doing a proof of concept for the >>>>> pipeline system and it seems promising to solve this and some >>>>> performance problems.
>>>>> I'm triying to see it, but Google Code server is having problems at >>>>> this moment... :-/ I'll try it again later.
>>>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>>>> you've mentioned a dictionary store a state during all >>>>>> transformations. Is it that what you're thinking of?
>>>>> At GStreamer, data flow thought plugins (filters) from one plugin >>>>> source (output) to next plugin sink (input), just like a water >>>>> pipeline, but only if both source and sink understand the same caps >>>>> (data format). Another solution is just like my previous work, where >>>>> every stage got a temporal dict as parameter with it's data and it was >>>>> being modified thought every stage, just like a car asembly line.
>>>>> My pipeline proof of concept is more similar to GStreamer one, because >>>>> i think is more elegant and also using generators it could be more >>>>> efficient. First i have a pipeline class, that's basically a python >>>>> list where i store the filters, and then i use the output of the >>>>> filter (the generator) as the stream source for the next one. More >>>>> than this, when executing the pipeline it has the same method sign >>>>> that filters, so you can pass an input stream and it's given to the >>>>> first filter. Finally, the processed data is stabilized and returned. >>>>> The mayor modification is that i'm using callable objects (standar >>>>> functions and objects with __call__() method) instead of regular >>>>> filters, so they can be used in any fashion way and the stream can >>>>> change it's type dinamically (string, array... whatever), and also it >>>>> lets to use filters indepently, because just using callable objects >>>>> they can be able to be called outside a pipeline.
>>>>> Here you have an example of a callable filter:
>>>>> class StripComments(Filter): >>>>> """Strip the comments from a stack""" >>>>> def process(self, stack, stream): >>>>> for token_type, value in stream: >>>>> if token_type not in Comment: >>>>> yield token_type, value
>>>>> But also, since python functions are objects, the filter would be a regular one:
>>>>> def StripComments(stream): >>>>> """Strip the comments from a stack""" >>>>> for token_type, value in stream: >>>>> if token_type not in Comment: >>>>> yield token_type, value
>>>>> Pretty nice, eh? Now filters are just normal functions instead of >>>>> classes :-D Just an important question: until this moment i didn't >>>>> need to use the stack parameter in any filter, why is it being used >>>>> for?
>>>>> I've attached my current pipeline code with a trivial working example, >>>>> i hope i could be able to finish it at my project code today. It's >>>>> using the functors system so it's able to use functions or objects >>>>> with the __call__() method, but it's the only difference with current >>>>> filters and it's debatable to modify the pipeline to use filters as >>>>> they are at this moment (classes with process method).
>>>>> -- >>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>> Unix." >>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> On Thu, Sep 1, 2011 at 11:08 AM, pira...@gmail.com <pira...@gmail.com> wrote: >> I have it at my clone, but i don't have problems to upload there :-)
> Cool, thanks! It's easier to add comments there :)
> -Andi
>> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>> I have just finished the clean-up and uploaded my code so you can >>>> review it and tell me your comments about it... Main beneficts are >>>> that code is simple and more modular since the pipeline instance is >>>> "dumb", and also it let to reuse some procesed fragments of the >>>> pipeline and also use it as another filter since it use the same >>>> interface: get streamed input data and output data, althought is not >>>> directly usable as a current format filter, but this would be easily >>>> addapted using callables...
>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>> Ok, finished conversion of my project code... and worked :-) Unit test >>>>> time have lowered with my pipeline system from more than 4 seconds to >>>>> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >>>>> a great improvement :-D I'm going to do some clean-ups previously to >>>>> send some pathches :-)
>>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>>> Filters at this point should return 2-tuples (type, value). Your change >>>>>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>>>>> with an additional filter that returns the columns then?
>>>>>> It's true, maybe it's easier, or at least it's coherent with the >>>>>> current code basis. In any case i'm doing a proof of concept for the >>>>>> pipeline system and it seems promising to solve this and some >>>>>> performance problems.
>>>>>> I'm triying to see it, but Google Code server is having problems at >>>>>> this moment... :-/ I'll try it again later.
>>>>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>>>>> you've mentioned a dictionary store a state during all >>>>>>> transformations. Is it that what you're thinking of?
>>>>>> At GStreamer, data flow thought plugins (filters) from one plugin >>>>>> source (output) to next plugin sink (input), just like a water >>>>>> pipeline, but only if both source and sink understand the same caps >>>>>> (data format). Another solution is just like my previous work, where >>>>>> every stage got a temporal dict as parameter with it's data and it was >>>>>> being modified thought every stage, just like a car asembly line.
>>>>>> My pipeline proof of concept is more similar to GStreamer one, because >>>>>> i think is more elegant and also using generators it could be more >>>>>> efficient. First i have a pipeline class, that's basically a python >>>>>> list where i store the filters, and then i use the output of the >>>>>> filter (the generator) as the stream source for the next one. More >>>>>> than this, when executing the pipeline it has the same method sign >>>>>> that filters, so you can pass an input stream and it's given to the >>>>>> first filter. Finally, the processed data is stabilized and returned. >>>>>> The mayor modification is that i'm using callable objects (standar >>>>>> functions and objects with __call__() method) instead of regular >>>>>> filters, so they can be used in any fashion way and the stream can >>>>>> change it's type dinamically (string, array... whatever), and also it >>>>>> lets to use filters indepently, because just using callable objects >>>>>> they can be able to be called outside a pipeline.
>>>>>> Here you have an example of a callable filter:
>>>>>> class StripComments(Filter): >>>>>> """Strip the comments from a stack""" >>>>>> def process(self, stack, stream): >>>>>> for token_type, value in stream: >>>>>> if token_type not in Comment: >>>>>> yield token_type, value
>>>>>> But also, since python functions are objects, the filter would be a regular one:
>>>>>> def StripComments(stream): >>>>>> """Strip the comments from a stack""" >>>>>> for token_type, value in stream: >>>>>> if token_type not in Comment: >>>>>> yield token_type, value
>>>>>> Pretty nice, eh? Now filters are just normal functions instead of >>>>>> classes :-D Just an important question: until this moment i didn't >>>>>> need to use the stack parameter in any filter, why is it being used >>>>>> for?
>>>>>> I've attached my current pipeline code with a trivial working example, >>>>>> i hope i could be able to finish it at my project code today. It's >>>>>> using the functors system so it's able to use functions or objects >>>>>> with the __call__() method, but it's the only difference with current >>>>>> filters and it's debatable to modify the pipeline to use filters as >>>>>> they are at this moment (classes with process method).
>>>>>> -- >>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>> Unix." >>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>> -- >>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>> Unix." >>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema operativo Linux
> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >> On Thu, Sep 1, 2011 at 11:08 AM, pira...@gmail.com <pira...@gmail.com> wrote: >>> I have it at my clone, but i don't have problems to upload there :-)
>> Cool, thanks! It's easier to add comments there :)
>> -Andi
>>> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >>>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>>> I have just finished the clean-up and uploaded my code so you can >>>>> review it and tell me your comments about it... Main beneficts are >>>>> that code is simple and more modular since the pipeline instance is >>>>> "dumb", and also it let to reuse some procesed fragments of the >>>>> pipeline and also use it as another filter since it use the same >>>>> interface: get streamed input data and output data, althought is not >>>>> directly usable as a current format filter, but this would be easily >>>>> addapted using callables...
>>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>> Ok, finished conversion of my project code... and worked :-) Unit test >>>>>> time have lowered with my pipeline system from more than 4 seconds to >>>>>> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >>>>>> a great improvement :-D I'm going to do some clean-ups previously to >>>>>> send some pathches :-)
>>>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>>>> Filters at this point should return 2-tuples (type, value). Your change >>>>>>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>>>>>> with an additional filter that returns the columns then?
>>>>>>> It's true, maybe it's easier, or at least it's coherent with the >>>>>>> current code basis. In any case i'm doing a proof of concept for the >>>>>>> pipeline system and it seems promising to solve this and some >>>>>>> performance problems.
>>>>>>> I'm triying to see it, but Google Code server is having problems at >>>>>>> this moment... :-/ I'll try it again later.
>>>>>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>>>>>> you've mentioned a dictionary store a state during all >>>>>>>> transformations. Is it that what you're thinking of?
>>>>>>> At GStreamer, data flow thought plugins (filters) from one plugin >>>>>>> source (output) to next plugin sink (input), just like a water >>>>>>> pipeline, but only if both source and sink understand the same caps >>>>>>> (data format). Another solution is just like my previous work, where >>>>>>> every stage got a temporal dict as parameter with it's data and it was >>>>>>> being modified thought every stage, just like a car asembly line.
>>>>>>> My pipeline proof of concept is more similar to GStreamer one, because >>>>>>> i think is more elegant and also using generators it could be more >>>>>>> efficient. First i have a pipeline class, that's basically a python >>>>>>> list where i store the filters, and then i use the output of the >>>>>>> filter (the generator) as the stream source for the next one. More >>>>>>> than this, when executing the pipeline it has the same method sign >>>>>>> that filters, so you can pass an input stream and it's given to the >>>>>>> first filter. Finally, the processed data is stabilized and returned. >>>>>>> The mayor modification is that i'm using callable objects (standar >>>>>>> functions and objects with __call__() method) instead of regular >>>>>>> filters, so they can be used in any fashion way and the stream can >>>>>>> change it's type dinamically (string, array... whatever), and also it >>>>>>> lets to use filters indepently, because just using callable objects >>>>>>> they can be able to be called outside a pipeline.
>>>>>>> Here you have an example of a callable filter:
>>>>>>> class StripComments(Filter): >>>>>>> """Strip the comments from a stack""" >>>>>>> def process(self, stack, stream): >>>>>>> for token_type, value in stream: >>>>>>> if token_type not in Comment: >>>>>>> yield token_type, value
>>>>>>> But also, since python functions are objects, the filter would be a regular one:
>>>>>>> def StripComments(stream): >>>>>>> """Strip the comments from a stack""" >>>>>>> for token_type, value in stream: >>>>>>> if token_type not in Comment: >>>>>>> yield token_type, value
>>>>>>> Pretty nice, eh? Now filters are just normal functions instead of >>>>>>> classes :-D Just an important question: until this moment i didn't >>>>>>> need to use the stack parameter in any filter, why is it being used >>>>>>> for?
>>>>>>> I've attached my current pipeline code with a trivial working example, >>>>>>> i hope i could be able to finish it at my project code today. It's >>>>>>> using the functors system so it's able to use functions or objects >>>>>>> with the __call__() method, but it's the only difference with current >>>>>>> filters and it's debatable to modify the pipeline to use filters as >>>>>>> they are at this moment (classes with process method).
>>>>>>> -- >>>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>>> Unix." >>>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>>> -- >>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>> Unix." >>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>> -- >>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>> Unix." >>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>> To post to this group, send email to sqlparse@googlegroups.com. >>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>> -- >>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>> monton de sitios diferentes, simplemente escribe un sistema operativo >>> Unix." >>> – Linus Tordvals, creador del sistema operativo Linux
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > "Si quieres viajar alrededor del mundo y ser invitado a hablar en un > monton de sitios diferentes, simplemente escribe un sistema operativo > Unix." > – Linus Tordvals, creador del sistema operativo Linux
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >>> On Thu, Sep 1, 2011 at 11:08 AM, pira...@gmail.com <pira...@gmail.com> wrote: >>>> I have it at my clone, but i don't have problems to upload there :-)
>>> Cool, thanks! It's easier to add comments there :)
>>> -Andi
>>>> 2011/9/1 Andi Albrecht <albrecht.a...@googlemail.com>: >>>>> "pira...@gmail.com" <pira...@gmail.com> writes:
>>>>>> I have just finished the clean-up and uploaded my code so you can >>>>>> review it and tell me your comments about it... Main beneficts are >>>>>> that code is simple and more modular since the pipeline instance is >>>>>> "dumb", and also it let to reuse some procesed fragments of the >>>>>> pipeline and also use it as another filter since it use the same >>>>>> interface: get streamed input data and output data, althought is not >>>>>> directly usable as a current format filter, but this would be easily >>>>>> addapted using callables...
>>>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>>> Ok, finished conversion of my project code... and worked :-) Unit test >>>>>>> time have lowered with my pipeline system from more than 4 seconds to >>>>>>> just 2.4 :-D It's double from the initial, pre-sqlparse 1.2, but it's >>>>>>> a great improvement :-D I'm going to do some clean-ups previously to >>>>>>> send some pathches :-)
>>>>>>> 2011/8/17 pira...@gmail.com <pira...@gmail.com>: >>>>>>>>> Filters at this point should return 2-tuples (type, value). Your change >>>>>>>>> breaks with this. Wouldn't it be easier to build a custom filter stack >>>>>>>>> with an additional filter that returns the columns then?
>>>>>>>> It's true, maybe it's easier, or at least it's coherent with the >>>>>>>> current code basis. In any case i'm doing a proof of concept for the >>>>>>>> pipeline system and it seems promising to solve this and some >>>>>>>> performance problems.
>>>>>>>> I'm triying to see it, but Google Code server is having problems at >>>>>>>> this moment... :-/ I'll try it again later.
>>>>>>>>> I don't know which concept GStreamer uses for there pipelines. IIRC >>>>>>>>> you've mentioned a dictionary store a state during all >>>>>>>>> transformations. Is it that what you're thinking of?
>>>>>>>> At GStreamer, data flow thought plugins (filters) from one plugin >>>>>>>> source (output) to next plugin sink (input), just like a water >>>>>>>> pipeline, but only if both source and sink understand the same caps >>>>>>>> (data format). Another solution is just like my previous work, where >>>>>>>> every stage got a temporal dict as parameter with it's data and it was >>>>>>>> being modified thought every stage, just like a car asembly line.
>>>>>>>> My pipeline proof of concept is more similar to GStreamer one, because >>>>>>>> i think is more elegant and also using generators it could be more >>>>>>>> efficient. First i have a pipeline class, that's basically a python >>>>>>>> list where i store the filters, and then i use the output of the >>>>>>>> filter (the generator) as the stream source for the next one. More >>>>>>>> than this, when executing the pipeline it has the same method sign >>>>>>>> that filters, so you can pass an input stream and it's given to the >>>>>>>> first filter. Finally, the processed data is stabilized and returned. >>>>>>>> The mayor modification is that i'm using callable objects (standar >>>>>>>> functions and objects with __call__() method) instead of regular >>>>>>>> filters, so they can be used in any fashion way and the stream can >>>>>>>> change it's type dinamically (string, array... whatever), and also it >>>>>>>> lets to use filters indepently, because just using callable objects >>>>>>>> they can be able to be called outside a pipeline.
>>>>>>>> Here you have an example of a callable filter:
>>>>>>>> class StripComments(Filter): >>>>>>>> """Strip the comments from a stack""" >>>>>>>> def process(self, stack, stream): >>>>>>>> for token_type, value in stream: >>>>>>>> if token_type not in Comment: >>>>>>>> yield token_type, value
>>>>>>>> But also, since python functions are objects, the filter would be a regular one:
>>>>>>>> def StripComments(stream): >>>>>>>> """Strip the comments from a stack""" >>>>>>>> for token_type, value in stream: >>>>>>>> if token_type not in Comment: >>>>>>>> yield token_type, value
>>>>>>>> Pretty nice, eh? Now filters are just normal functions instead of >>>>>>>> classes :-D Just an important question: until this moment i didn't >>>>>>>> need to use the stack parameter in any filter, why is it being used >>>>>>>> for?
>>>>>>>> I've attached my current pipeline code with a trivial working example, >>>>>>>> i hope i could be able to finish it at my project code today. It's >>>>>>>> using the functors system so it's able to use functions or objects >>>>>>>> with the __call__() method, but it's the only difference with current >>>>>>>> filters and it's debatable to modify the pipeline to use filters as >>>>>>>> they are at this moment (classes with process method).
>>>>>>>> -- >>>>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>>>> Unix." >>>>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>>>> -- >>>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>>> Unix." >>>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>>> -- >>>>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>>>> Unix." >>>>>> – Linus Tordvals, creador del sistema operativo Linux
>>>>> -- >>>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>>> To post to this group, send email to sqlparse@googlegroups.com. >>>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>>> -- >>>> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >>>> monton de sitios diferentes, simplemente escribe un sistema operativo >>>> Unix." >>>> – Linus Tordvals, creador del sistema operativo Linux
>>>> -- >>>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>>> To post to this group, send email to sqlparse@googlegroups.com. >>>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>>> -- >>> You received this message because you are subscribed to the Google Groups "sqlparse" group. >>> To post to this group, send email to sqlparse@googlegroups.com. >>> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >>> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
>> -- >> "Si quieres viajar alrededor del mundo y ser invitado a hablar en un >> monton de sitios diferentes, simplemente escribe un sistema operativo >> Unix." >> – Linus Tordvals, creador del sistema operativo Linux
>> -- >> You received this message because you are subscribed to the Google Groups "sqlparse" group. >> To post to this group, send email to sqlparse@googlegroups.com. >> To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "sqlparse" group. > To post to this group, send email to sqlparse@googlegroups.com. > To unsubscribe from this group, send email to sqlparse+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/sqlparse?hl=en.
-- "Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton de sitios diferentes, simplemente escribe un sistema operativo Unix." – Linus Tordvals, creador del sistema
...