Here's an example of the config file:
1:6,7:1,8:9,17:15,32:10
This converts a fixed-width file to a tab-delimited where the first
field is the first six characters of the file, the second is the
seventh, etc. Conversely, it converts a tab-delimited file to a file
where the first six characters are the first tab field, right-padded
with spaces, and so on.
What I want to do is look at the file and decide whether to run the
function to convert the file to tab or FW. Here is what works
(mostly):
x = inputFile.readline().split("\t")
inputFile.seek(0)
if len(x) > 1:
toFW(inputFile)
else:
toTab(inputFile)
The problem is that my file accepts the input file via stdin (pipe) or
as an argument to the script. If I send the filename as an argument,
everything works perfectly.
If I pipe the input file into the script, it is unable to seek() it. I
tried making a copy of inputFile and doing a readline() from it, but
being a reference, it makes no difference.
How can I check a line (or two) from my input file (or stdin stream)
and still be able to process all the records with my function?
Thanks,
Shawn
> How can I check a line (or two) from my input file (or stdin stream)
> and still be able to process all the records with my function?
One way:
from itertools import chain
firstline = instream.next()
head = [firstline]
# loop over entire file
for line in chain(head, instream):
process(line)
You can of course read more than one line as long as you append it to the
head list. Here's an alternative:
from itertools import tee
a, b = tee(instream)
for line in a:
# determine file format,
# break when done
# this is crucial for memory efficiency
# but may have no effect in implementations
# other than CPython
del a
# loop over entire file
for line in b:
# process line
Peter