Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Text Manipulation in Python

3 views
Skip to first unread message

Edward Hasted

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
I am completely new to Python.

We want to use it to alter specific lines in template files, typically
something like changing:-

Variable = 1234

to

Variable = Company Name

The text manipulation strings within Python seem to work sequentially
rather than on a line basis.

1. Is this correct?
2. What is the best way to do text searching and manipulation within
Python.

Many thanks,

Edward


William Park

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to Edward Hasted
On Wed, Nov 10, 1999 at 08:46:42AM +0000, Edward Hasted wrote:
> I am completely new to Python.
>
> We want to use it to alter specific lines in template files, typically
> something like changing:-
>
> Variable = 1234
>
> to
>
> Variable = Company Name
>
> The text manipulation strings within Python seem to work sequentially
> rather than on a line basis.
>
> 1. Is this correct?

Well, you have read line by line. Try something like
f = open(..., "r")
for s in f.readlines():
s = ...do string substitution using 're' or 'string' module...
print s
But, if you're running Unix, you can also use 'sed'.

> 2. What is the best way to do text searching and manipulation within
> Python.
>
> Many thanks,
>
> Edward
>
>

> --
> http://www.python.org/mailman/listinfo/python-list

Thomas Fuchs

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
In article <VA.0000001...@ops.corpex.com>,

edw...@corpex.com wrote:
> I am completely new to Python.
>
> We want to use it to alter specific lines in template files, typically
> something like changing:-
>
> Variable = 1234
>
> to
>
> Variable = Company Name
>
If you can specify the format of the template file, use Python syntax.
For example:

# file: mytemplate.py
t = '''\
-----------------------
Variable = %(company)s
-----------------------
'''

# file: mymain.py
import mytemplate
company = 'CoCo'
f = open('myoutput', 'w')
f.write(mytemplate.t % locals())
f.close()

Sent via Deja.com http://www.deja.com/
Before you buy.

Aahz Maruch

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
In article <1999111006...@better.net>,

William Park <pa...@better.net> wrote:
>
>But, if you're running Unix, you can also use 'sed'.

Even if you're not running Unix, you can also use sed.
--
--- Aahz (@netcom.com)

Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6 (if you want to know, do some research)

Donn Cave

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
Quoth Edward Hasted <edw...@corpex.com>:

| We want to use it to alter specific lines in template files, typically
| something like changing:-
|
| Variable = 1234
|
| to
|
| Variable = Company Name
|
| The text manipulation strings within Python seem to work sequentially
| rather than on a line basis.
|
| 1. Is this correct?

Hard to say, that analysis may be a little too terse.

| 2. What is the best way to do text searching and manipulation within
| Python.

Depends! I thought the system Thomas Fuchs proposed in his followup
was interesting, and if it works for you, perhaps that's close enough
to the best - but there isn't enough detail here to know whether it's
suited to your needs at all.

Another followup mentioned that perhaps the best way is to not do it
in Python, and there's something to that too. Certainly on UNIX, or
platforms like BeOS that borrow from its repertoire, there are an
embarrassment of riches in text manipulation languages, and it's a
point of tradition that the language or tool to solve each little
problem be considered separately (I don't mean that to sound silly,
either, but will not expound further here.)

In a project of mine, on BeOS, I've been using "m4", a macro language,
to do what you seem to have in mind. Most of the heavy lifting is done
in Python, but the result goes into a template processed by m4. It's
kind of like the C preprocessor but much more capable, and despite its
relative obscurity, usually it's already there on a normal UNIX platform,
in my experience anyway.

Donn Cave, University Computing Services, University of Washington
do...@u.washington.edu

Bernhard Reiter

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
On Wed, 10 Nov 1999 08:46:42 GMT, Edward Hasted <edw...@corpex.com> wrote:
>I am completely new to Python.
Welcome to a land of programming pleasure.

>We want to use it to alter specific lines in template files, typically
>something like changing:-
>
>Variable = 1234
>

>Variable = Company Name
>
>The text manipulation strings within Python seem to work sequentially
>rather than on a line basis.
>
>1. Is this correct?

If I understand you correctly, yes.
But (of course a but) you can relatively easy make python
work on a per line basis.

>2. What is the best way to do text searching and manipulation within
>Python.

Some other posters suggested looking into other tools, but I would refrain
from that. For one reason python is a unified language and a lot easier to
learn. The syntax of awk and sed can be tricky in times. And you might
be able to pull it off with bash scripting or vim replacements rules. Don't.

Use the nice fileinput module to get lines:

-----------
#! /bin/env python
"""" Per line processing example. """"

import sys
import string
import fileinput

def process(line):
""" Process one inputline and spit out, what we want. """
# get rid of the possible newline and the whitespace around chars
line=string.strip(line)
sys.stdout.write(line)

for lines in fileinput.input():
process(line)
-----------

Now you can use the string module functions to deal with one line,
play with pythons "Slicing" capabilties on the strings or even
pull out the big guns and play the regular expression card with the re module.

The fileinput module can even do in place renaming of files for you.
Oh you call that little program just with the filename(s).

With time you will come to the situation, when you want to pull some
more complicated tricks. Here is a rough example in which I tried
to imitate the way awk enfolds power on all sort of textprocessing tasks.
Don't read it right now, but look into it later... if you feel like it.
it contains a bunch of python related convetions and some variables not
used in this example, but ready for other textprocessing tasks.


Input files look like:

### Opening logfile (channel #whatever), [Mon Nov 2 09:22:49 1999]
[Mon Nov 2 09:23:04 1999]<willi 7,0;> Gut.
[Mon Nov 2 09:23:11 1999]<willi 7,0;> Wer leitet ?
[Mon Nov 2 09:23:37 1999]<Bob 7,0;> immer der Protokolleur
[Mon Nov 2 09:23:37 1999]#intevation> Lasst mich doch leiten, dann kann jemand anders Protokoll schreiben. :)
[Mon Nov 2 09:24:19 1999]<willi 7,0;> Scheint ja geklärt.

Outputfiles look like:

### Opening logfile (channel #whatever), [Mon Nov 2 09:22:49 1999]
[09:23:04] <willi> Gut.
[09:23:11] <willi> Wer leitet ?
[09:23:37] <Bob> immer der Protokolleur
[09:23:37] <Cliff> Lasst mich doch leiten, dann kann jemand anders Protokoll schreiben. :)
[09:24:19] <willi> Scheint ja geklärt.


-----------
#! /bin/env python
"""Format and clean tirc normal logfile for IRC session logs. v%(version)s

USAGE:
%(progname)s [inputfilename(s)]

Inputfiles will be replaced, if filenames are given.

The default is to use <Cliff> as user.
The time entries on each line will be cutted to include only the time
and not the day or year, which easily can be seen from the beginning
and end of a consecutive session.

Of course this is dependend on the time format tirc uses.
"""
__version__="0.1"
# initial 2.6.1999 Bernhard Reiter


import fileinput
import sys
import string
import re


nick="Cliff"

def process(line,fileinputobject,write):
"""Process one inputline and spit out what is wanted."""

# share some things between subsequent calls
global channel
global nickre

#f=string.split(line,";")
#f=map(string.strip,f)

# imitate gawk a bit ;-)
#NF=len(f)
FILENAME=fileinputobject.filename()
NR=fileinputobject.lineno()
FNR=fileinputobject.filelineno()


if line[0:3]=="###":
# opening or closing of logfile

# grab channelname
matchobj=re.search('\(channel (#[^)]+)\)',line)
channel=matchobj.group(1)
sys.stderr.write("Found channel: " + channel + "\n" )

# prepare nick replacement procedure
nickre=re.compile(re.escape("]" +channel+">"))

if line[0]=="[":
# normal line

# replace funny control string
line=string.replace(line,chr(3)+"7,0;","",1)
line=nickre.sub("]<"+nick+">",line,1)

# distance between time and normal text
if line[25:27]=="]<":
line=line[:26]+" "+line[26:]

# cut date and year out
line="["+line[12:20]+"]"+line[26:]

write(line)


def main():

if len(sys.argv)==1:
sys.stderr.write(__doc__ %
{"progname":sys.argv[0], "version":__version__})

if sys.platform=="win32":
import msvcrt
msvcrt.getch()
sys.stderr.write("\nPress any key to start reading from stdin.\n")
else:
sys.stderr.write("Now reading from stdin.\n")

fileinputobject=fileinput.input(sys.argv[1:],1,".org")
outputwritefunction=sys.stdout.write

try:
for line in fileinputobject:
process(line,fileinputobject,sys.stdout.write)
finally:
fileinputobject.close()

if __name__=="__main__":
main()
-----------

<cranking-out-python-with-hopes-not-to-confuse-you-too-much>ly,
Bernhard

--
Research Assistant, Geog Dept UM-Milwaukee, USA. (www.uwm.edu/~bernhard)
Free Software Projects and Consulting (intevation.net)
Association for a Free Informational Infrastructure (ffii.org)

0 new messages