Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Processing text using python

0 views
Skip to first unread message
Message has been deleted

Alex Martelli

unread,
Feb 20, 2006, 11:24:04 AM2/20/06
to
nuttydevil <sj...@sussex.ac.uk> wrote:

> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?

Open each file and call thefile.read(3) in a loop, move to the next file
when the current one is exhausted. What part of this is giving you
problems?


Alex

Xavier Morel

unread,
Feb 20, 2006, 11:34:53 AM2/20/06
to
nuttydevil wrote:
> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?
>
>
> I'm going to be optimistic and thank you for your help in advance!
> Samantha.
>
Since you're reading from files, the "read" operation of file-like
objects takes an argument specifying the number of characters to read
from the stream e.g.

>>> f = file("stuff.txt")
>>> f.read(3)
'car'
>>> f.read(3)
'act'
>>> f.read()
'erization'

Would that be enough for what you need?

Roy Smith

unread,
Feb 20, 2006, 11:33:19 AM2/20/06
to
In article <1140451760....@g43g2000cwa.googlegroups.com>,
"nuttydevil" <sj...@sussex.ac.uk> wrote:

> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?

Don't reinvent the wheel. Take a look at http://www.biopython.org/.

danmc...@yahoo.com

unread,
Feb 20, 2006, 11:36:19 AM2/20/06
to
I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();

Steven Bethard

unread,
Feb 20, 2006, 11:44:58 AM2/20/06
to


Or:

def read3():
return file.read(3)
for chars in iter(read3, ''):
... do something with chars ...

STeVe

Fredrik Lundh

unread,
Feb 20, 2006, 11:46:02 AM2/20/06
to pytho...@python.org
"nuttydevil" <sj...@sussex.ac.uk> wrote:

> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?

did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.html#SECTION005120000000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

</F>

John Zenger

unread,
Feb 20, 2006, 12:11:09 PM2/20/06
to
If you have already read the string into memory and want a convenient
way to loop through it 3 characters at a time, check out the "batch" recipe:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303279

It uses itertools to make an iterator over the string, returning 3
characters at a time. Cool stuff.


nuttydevil wrote:
> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?
>
>

Gerard Flanagan

unread,
Feb 20, 2006, 12:24:47 PM2/20/06
to
nuttydevil wrote:
> Hey everyone! I'm hoping someone will be able to help me, cause I
> haven't had success searching on the web so far... I have large chunks
> of text ( all in a long string) that are currently all in separate
> notebook files. I want to use python to read these strings of text,
> THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
> I need to read and analyse each sequence one codon at a time
> effectively.) Does anyone have any idea of how to do this using python?
>
>
> I'm going to be optimistic and thank you for your help in advance!
> Samantha.


data1 = '''FOOTFALLSECHOINTHEMEMORY
DOWNTHEPASSAGEWHICHWEDIDNOTTAKE
TOWARDSTHEDOORWENEVEROPENED'''

num_codons = len(data1) // 3

codons = [ data1[3*i:3*(i+1)] for i in range( num_codons ) ]

print codons

class Codon(object):
#__slots__ = ['alpha', 'beta', 'gamma']
def __init__(self, a, b, c):
self.alpha = a
self.beta = b
self.gamma = c

codons = [ Codon(*codon) for codon in codons ]

print codons[0].alpha, codons[0].beta, codons[0].gamma

###output####

['FOO', 'TFA', 'LLS', 'ECH', 'OIN', 'THE', 'MEM', 'ORY', '\nDO', 'WNT',
'HEP', 'ASS', 'AGE', 'WHI', 'CHW', 'EDI', 'DNO', 'TTA', 'KE\n', 'TOW',
'ARD', 'STH', 'EDO', 'ORW', 'ENE', 'VER', 'OPE', 'NED']
F O O


Gerard

danmc...@yahoo.com

unread,
Feb 20, 2006, 3:41:13 PM2/20/06
to
Sure. There's probably a thousand ways to do this.

pla...@alumni.caltech.edu

unread,
Feb 20, 2006, 4:12:23 PM2/20/06
to
Hi,

you have plenty of good responses. I thought I would add one more:

def data_iter(file_name):
data = file(file_name)
while True:
value = data.read(3)
if not value:
break
yield value
data.close()

With the above, you can grab the entire data set (3 characters at a
time) like so:

data_set = [ d for d in data_iter('data') ]

Or iterate over it:

for d in data_iter('data'):
# do stuff

Enjoy!

Xavier Morel

unread,
Feb 20, 2006, 5:43:21 PM2/20/06
to
Fredrik Lundh wrote:
> did you read the string chapter in the tutorial ?
>
> http://docs.python.org/tut/node5.html#SECTION005120000000000000000
>
> around the middle of that chapter, there's a section on slicing:
>
> "substrings can be specified with the slice notation: two indices
> separated by a colon"
>
Fredrik, how would you use slices to split a string by groups of 3
characters?

Alex Martelli

unread,
Feb 20, 2006, 5:43:05 PM2/20/06
to
Xavier Morel <xavier...@masklinn.net> wrote:

I can't answer for him, but maybe:

[s[i:i+3] for i in xrange(0, len(s), 3)]

...?


Alex

0 new messages