Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Simple code and suggestion

56 views

Skip to first unread message

g thakuri

unread,

Nov 30, 2016, 8:26:35 AM11/30/16

Dear Python friends,

I have a simple question , need your suggestion the same

I would want to avoid using multiple split in the below code , what options
do we have before tokenising the line?, may be validate the first line any
other ideas

cmd = 'utility %s' % (file)
out, err, exitcode = command_runner(cmd)
data = stdout.strip().split('\n')[0].split()[5][:-2]

Love,
PT

Jussi Piitulainen

unread,

Nov 30, 2016, 9:01:50 AM11/30/16

g thakuri writes:

> I would want to avoid using multiple split in the below code , what
> options do we have before tokenising the line?, may be validate the
> first line any other ideas
>
> cmd = 'utility %s' % (file)
> out, err, exitcode = command_runner(cmd)
> data = stdout.strip().split('\n')[0].split()[5][:-2]

That .strip() looks suspicious to me, but perhaps you know better.

Also, stdout should be out, right?

You can use io.StringIO to turn a string into an object that you can
read line by line just like a file object. This reads just the first
line and picks the part that you want:

data = next(io.StringIO(out)).split()[5][:-2]

I don't know how much this affects performance, but it's kind of neat.

A thing I like to do is name all fields even I don't use them all. The
assignment will fail with an exception if there's an unexpected number
of fields, and that's usually what I want when input is bad:

line = next(io.StringIO(out))
ID, FORM, LEMMA, POS, TAGS, WEV, ETC = line.split()
data = WEV[:-2]

(Those are probably not appropriate names for your fields :)

Just a couple of ideas that you may like to consider.

Ganesh Pal

unread,

Nov 30, 2016, 9:22:49 AM11/30/16

On Wed, Nov 30, 2016 at 7:33 PM, Dennis Lee Bieber <wlf...@ix.netcom.com>
wrote:

> On Wed, 30 Nov 2016 18:56:21 +0530, g thakuri <gbp...@gmail.com>
> declaimed
> the following:

>
> >Dear Python friends,
> >
> >I have a simple question , need your suggestion the same
> >

> >I would want to avoid using multiple split in the below code , what
> options
> >do we have before tokenising the line?, may be validate the first line any
> >other ideas
> >
> > cmd = 'utility %s' % (file)
> > out, err, exitcode = command_runner(cmd)
> > data = stdout.strip().split('\n')[0].split()[5][:-2]
> >

> 1) Where did "stdout" come from? (I suspect you meant just
> "out")
>

My bad it should have been out , here is the updated code

> cmd = 'utility %s' % (file)
> out, err, exitcode = command_runner(cmd)

> data = out.strip().split('\n')[0].split()[5][:-2]
>

> 2) The [0] indicates you are only interested in the FIRST
> LINE; if so,
> just remove the entire ".split('\n')[0]" since the sixth white space
> element on the first line is also the sixth white space element of the
> entire returned data.
>
>
Yes , I am interested only in the first line , may be we can test if we
have a line[0] before tokenising the line ?

>
>

Veek M

unread,

Dec 8, 2016, 11:29:25 PM12/8/16

import re
? regex/pattern matching module..

0 new messages