Making a copy (not reference) of a file handle, or starting stdin over at line 0

0 views
Skip to first unread message

Shawn Milochik

unread,
Aug 17, 2007, 1:47:16 PM8/17/07
to pytho...@python.org
I wrote a script which will convert a tab-delimited file to a
fixed-width file, or a fixed-width file into a tab-delimited. It reads
a config file which defines the field lengths, and uses it to convert
either way.

Here's an example of the config file:

1:6,7:1,8:9,17:15,32:10

This converts a fixed-width file to a tab-delimited where the first
field is the first six characters of the file, the second is the
seventh, etc. Conversely, it converts a tab-delimited file to a file
where the first six characters are the first tab field, right-padded
with spaces, and so on.

What I want to do is look at the file and decide whether to run the
function to convert the file to tab or FW. Here is what works
(mostly):

x = inputFile.readline().split("\t")
inputFile.seek(0)

if len(x) > 1:
toFW(inputFile)
else:
toTab(inputFile)


The problem is that my file accepts the input file via stdin (pipe) or
as an argument to the script. If I send the filename as an argument,
everything works perfectly.

If I pipe the input file into the script, it is unable to seek() it. I
tried making a copy of inputFile and doing a readline() from it, but
being a reference, it makes no difference.

How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?

Thanks,
Shawn

Peter Otten

unread,
Aug 17, 2007, 2:23:32 PM8/17/07
to
Shawn Milochik wrote:

> How can I check a line (or two) from my input file (or stdin stream)
> and still be able to process all the records with my function?

One way:

from itertools import chain
firstline = instream.next()
head = [firstline]

# loop over entire file
for line in chain(head, instream):
process(line)


You can of course read more than one line as long as you append it to the
head list. Here's an alternative:

from itertools import tee
a, b = tee(instream)

for line in a:
# determine file format,
# break when done

# this is crucial for memory efficiency
# but may have no effect in implementations
# other than CPython
del a

# loop over entire file
for line in b:
# process line


Peter

Reply all
Reply to author
Forward
0 new messages