> I have enjoyed using ka-ping yee’s tokenizer.py. I would like to
> replace the readline parameter input with my own and pass a list of
> strings to the tokenizer. I understand it must be a callable object and
> iteratable but it is obvious with errors I am getting, that this is not
> the only functions required.
not sure I can decipher your detailed requirements, but to use Python's
standard "tokenize" module (written by ping) on a list, you can simple
do as follows:
import tokenize
program = [ ... program given as list ... ]
for token in tokenize.generate_tokens(iter(program).next):
print token
another approach is to turn the list back into a string, and wrap that
in a StringIO object:
import tokenize
import StringIO
program = [ ... program given as list ... ]
program_buffer = StringIO.StringIO("".join(program))
for token in tokenize.generate_tokens(program_buffer.readline):
print token
</F>
This is exactly what I need. Thank you.
I would like to do one additional function. I am not using the tokenizer to
parse python code. It happens to work very well for my application.
However, I would like either or both of the following variance:
1) I would like to add 2 other characters as comment designation
2) write a module that can readline, modify the line as required, and
finally, this module can be used as the argument for the tokenizer.
Def modifyLine( fileHandle ):
# readline and modify this string if required
...
For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
Print token
Anxiously looking forward to your thoughts.
karl
Karl Kobata wrote:
import tokenize
import tokenize
import StringIO
program_buffer = StringIO.StringIO("".join(program))
</F>
This is an interesting construction:
>>> a= [ 'a', 'b', 'c' ]
>>> def moditer( mod, nextfun ):
... while 1:
... yield mod( nextfun( ) )
...
>>> list( moditer( ord, iter( a ).next ) )
[97, 98, 99]
Here's my point:
>>> a= [ 'print a', 'print b', 'print c' ]
>>> tokenize.generate_tokens( iter( a ).next )
<generator object at 0x009FF440>
>>> tokenize.generate_tokens( moditer( lambda s: s+ '#', iter( a ).next ).next )
It adds a '#' to the end of every line, then tokenizes.