Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

collecting data from file

0 views
Skip to first unread message

Mustafa Celik

unread,
Apr 11, 2003, 3:41:23 AM4/11/03
to
Hi,
I want to scroll thru a file;
* find lines that match a string (e.g. HELLO) on the 2nd column
* add up the 4th column (an integer) on each matching line, say the
variable is TOTAL
* subtract the 4th column from TOTAL if another is string (e.g. BYE)
is hit on 2nd column of a line

Any tips?


Jeremy Yallop

unread,
Apr 11, 2003, 5:36:58 AM4/11/03
to

I'd use awk for this.

{
if ($2 == "HELLO") { total += $4 }
else if ($2 == "BYE") { total -= $4 }
}
END { print total }

Jeremy.

Peter Hansen

unread,
Apr 11, 2003, 8:36:12 AM4/11/03
to

Yes, post an example showing what you intend. The instructions are
decent, but implementable only by making a few assumptions about what
you mean. Also, are there any error conditions to be handled, or do
you guarantee the input is always perfect (e.g., always at least four
columns, and so on)?

-Peter

Terry Reedy

unread,
Apr 11, 2003, 9:33:15 AM4/11/03
to

"Peter Hansen" <pe...@engcorp.com> wrote in message
news:3E96B6BC...@engcorp.com...

> Mustafa Celik wrote:
> >
> > I want to scroll thru a file;
> > * find lines that match a string (e.g. HELLO) on the 2nd column
> > * add up the 4th column (an integer) on each matching line, say
the
> > variable is TOTAL
> > * subtract the 4th column from TOTAL if another is string (e.g.
BYE)
> > is hit on 2nd column of a line
> >
> > Any tips?
>
> Yes, post an example showing what you intend. The instructions are
> decent, but implementable only by making a few assumptions about
what
> you mean.

In particular, how are the 'columns' specified. Your examples imply
that you mean variable-length data fields rather than character
columns. If so, are they separated by <space>, <tab>, a mixture of
the two, <comma>, or something else.

> Also, are there any error conditions to be handled, or do
> you guarantee the input is always perfect (e.g., always at least
four
> columns, and so on)?

Missing data is the great bugaboo of data analysis. Ignoring that
possibility

sum = 0
for line in file('mydata'):
fields = line.split() # assume whitespace
if field[1] == 'HELLO':
sum += field[3]
elif field[1] == 'BYE'
sum += field[3]

Terry J. Reedy


Don Arnold

unread,
Apr 11, 2003, 9:39:19 AM4/11/03
to
Jeremy Yallop <jer...@jdyallop.freeserve.co.uk> wrote in message news:<slrnb9d2sh...@saturn.cps.co.uk>...

Last time I checked, this was the Python newsgroup, so I think a
solution _in_ Python is more appropriate:

total = 0
infile = open('c:/temp2/input.txt')

for line in infile.readlines():
line_items = line.split()
if len(line_items) >= 4:
print '[%s]' % line_items
tag = line_items[1]
amt = float(line_items[3])

if tag == 'HELLO':
print 'adding', amt, '...'
total += amt
elif tag == 'BYE':
print 'subtracting', amt, '...'
total -= amt
print 'total:', total

---- output: ----

[['0', 'HELLO', 'stuff', '5']]
adding 5.0 ...
total: 5.0
[['1', 'HELLO', 'stuff', '3']]
adding 3.0 ...
total: 8.0
[['2', 'BYE', 'stuff', '2']]
subtracting 2.0 ...
total: 6.0
[['3', 'HELLO', 'stuff', '4']]
adding 4.0 ...
total: 10.0
[['4', 'BYE', 'stuff', '5']]
subtracting 5.0 ...
total: 5.0
[['5', 'HELLO', 'stuff', '2']]
adding 2.0 ...
total: 7.0
[['6', 'BYE', 'stuff', '1']]
subtracting 1.0 ...
total: 6.0
[['7', 'HELLO', 'stuff', '5']]
adding 5.0 ...
total: 11.0
[['8', 'BYE', 'stuff', '10']]
subtracting 10.0 ...
total: 1.0

HTH,
Don

Skip Montanaro

unread,
Apr 11, 2003, 9:58:39 AM4/11/03
to
Mustafa> I want to scroll thru a file;
[ and do stuff with it ]

Mustafa> Any tips?

As others have mentioned, your problem was underspecified. If your data is
tabular with a unique separator between fields, the new csv package
(available from Python CVS) should fill the bill. Curren development
version of the docs are here:

http://www.python.org/dev/doc/devel/lib/module-csv.html

Skip

Peter Hansen

unread,
Apr 11, 2003, 10:31:32 AM4/11/03
to
Terry Reedy wrote:
>
> > Also, are there any error conditions to be handled, or do
> > you guarantee the input is always perfect (e.g., always at least four
> > columns, and so on)?
>
> Missing data is the great bugaboo of data analysis. Ignoring that
> possibility
>
> sum = 0
> for line in file('mydata'):
> fields = line.split() # assume whitespace
> if field[1] == 'HELLO':
> sum += field[3]
> elif field[1] == 'BYE'
> sum += field[3]

I believe the last line was intended to read

sum -= field[3]

Terry, I like how you solution *exactly* matches the requirements
as specified, by not even printing the resulting sum. Very
efficient of you. :-)

-Peter

Mustafa Celik

unread,
Apr 11, 2003, 11:07:15 AM4/11/03
to
The file that I'm analyzing guarantees to have >4 columns, and lines
contain string, integer, float..., they are seperated by spaces.

An example line is as below:

1850099.32 HELLO 0xfce 6 OTTAWA stree_number_200 exit_Metcalfe
take_hw_417 arrive_at_Ottawa_Airport

2nd column will tell me if someone has arrive, 3rd column will tell me
how much $ they'll bring with them, and 4th column will tell who he is
(a hex guy).

I will find out;
* how many people arrived and did not leave,
* how many have arrived and left
* how much we have in Ottawa at the end
* how much transient money flow have occured

My file may contain some lines that only leaves Ottawa, so I should
ignore them (they probably arrived earlier, and should not decrement my
dollar gain) - This condition is not very important, if it'll
complicate things, I have other ways to prevent this.

I found a good awk document last night (can't remember the web site), I
was planning to dig in it today, but you guys have supported me with my
initial plan with Python. I'm not a beginner in Python, this is probably
not the best project to start with, but I need to have this done by Awk
or Python, ...

Thanks,
Mustafa

Mustafa Celik

unread,
Apr 11, 2003, 11:19:46 AM4/11/03
to

Correction on what columns really mean:

2nd column : if someone has arrived/left (HELLO/BYE)
3rd column : who he is (0x... ; hex)
4th column : how much $ they brought/took

0 new messages