Re: ka-ping yee tokenizer.py

Fredrik Lundh

unread,

Sep 15, 2008, 5:03:50 PM9/15/08

to pytho...@python.org

Karl Kobata wrote:

> I have enjoyed using ka-ping yee’s tokenizer.py. I would like to
> replace the readline parameter input with my own and pass a list of
> strings to the tokenizer. I understand it must be a callable object and
> iteratable but it is obvious with errors I am getting, that this is not
> the only functions required.

not sure I can decipher your detailed requirements, but to use Python's
standard "tokenize" module (written by ping) on a list, you can simple
do as follows:

import tokenize

program = [ ... program given as list ... ]

for token in tokenize.generate_tokens(iter(program).next):
print token

another approach is to turn the list back into a string, and wrap that
in a StringIO object:

import tokenize
import StringIO

program = [ ... program given as list ... ]

program_buffer = StringIO.StringIO("".join(program))

for token in tokenize.generate_tokens(program_buffer.readline):
print token

</F>

Karl Kobata

unread,

Sep 16, 2008, 3:48:51 PM9/16/08

to Fredrik Lundh, pytho...@python.org

Hi Fredrik,

This is exactly what I need. Thank you.
I would like to do one additional function. I am not using the tokenizer to
parse python code. It happens to work very well for my application.
However, I would like either or both of the following variance:
1) I would like to add 2 other characters as comment designation
2) write a module that can readline, modify the line as required, and
finally, this module can be used as the argument for the tokenizer.

Def modifyLine( fileHandle ):
# readline and modify this string if required
...

For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
Print token

Anxiously looking forward to your thoughts.
karl

Karl Kobata wrote:

import tokenize

import tokenize
import StringIO

program_buffer = StringIO.StringIO("".join(program))

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Aaron "Castironpi" Brady

unread,

Sep 16, 2008, 9:39:01 PM9/16/08

to

On Sep 16, 2:48 pm, "Karl Kobata" <karl.kob...@syncira.com> wrote:
> Hi Fredrik,
>
> This is exactly what I need. Thank you.
> I would like to do one additional function. I am not using the tokenizer to
> parse python code. It happens to work very well for my application.
> However, I would like either or both of the following variance:
> 1) I would like to add 2 other characters as comment designation
> 2) write a module that can readline, modify the line as required, and
> finally, this module can be used as the argument for the tokenizer.
>
> Def modifyLine( fileHandle ):
> # readline and modify this string if required
> ...
>
> For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
> Print token
>
> Anxiously looking forward to your thoughts.
> karl
>
> -----Original Message-----

> From: python-list-bounces+kkobata=syncira....@python.org
>
> [mailto:python-list-bounces+kkobata=syncira....@python.org] On Behalf Of

> Fredrik Lundh
> Sent: Monday, September 15, 2008 2:04 PM

> To: python-l...@python.org

> Subject: Re: ka-ping yee tokenizer.py
>
> Karl Kobata wrote:
>
> > I have enjoyed using ka-ping yee's tokenizer.py. I would like to
> > replace the readline parameter input with my own and pass a list of
> > strings to the tokenizer. I understand it must be a callable object and
> > iteratable but it is obvious with errors I am getting, that this is not
> > the only functions required.
>
> not sure I can decipher your detailed requirements, but to use Python's
> standard "tokenize" module (written by ping) on a list, you can simple
> do as follows:
>
> import tokenize
>
> program = [ ... program given as list ... ]
>
> for token in tokenize.generate_tokens(iter(program).next):
> print token
>
> another approach is to turn the list back into a string, and wrap that
> in a StringIO object:
>
> import tokenize
> import StringIO
>
> program = [ ... program given as list ... ]
>
> program_buffer = StringIO.StringIO("".join(program))
>
> for token in tokenize.generate_tokens(program_buffer.readline):
> print token
>
> </F>
>
> --http://mail.python.org/mailman/listinfo/python-list
>
>

This is an interesting construction:

>>> a= [ 'a', 'b', 'c' ]
>>> def moditer( mod, nextfun ):
... while 1:
... yield mod( nextfun( ) )
...
>>> list( moditer( ord, iter( a ).next ) )
[97, 98, 99]

Here's my point:

>>> a= [ 'print a', 'print b', 'print c' ]
>>> tokenize.generate_tokens( iter( a ).next )
<generator object at 0x009FF440>
>>> tokenize.generate_tokens( moditer( lambda s: s+ '#', iter( a ).next ).next )

It adds a '#' to the end of every line, then tokenizes.