Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA
Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4
Since you have to know, a priori, how to break the input string I
assume that these fields are of fixed length. You can use the following
to do what you want:
>>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
>>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %
(a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])
which results in the following output:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
Mark
Or try this:
import struct
test = '0139725635999992000010100534+42050-102800FM-15+1198KAIA'
template = '4s6s5s4s2s2s2s2s1s6s7s5s5s4s'
the_line = struct.unpack(template,test)
print the_line
print ','.join(the_line)
## ('0139', '725635', '99999', '2000', '01', '01', '00', '53', '4',
'+42050', '-102800', 'FM-15', '+1198', 'KAIA')
##
##
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
I don't think you are going to find anything that will just do this
for you. You are going to have read the file, figure out where to
split the string, and reprint it delimited with commas. As for
suggesting code... I can't tell how you actually want to delimit the
stuff from the above example? Are the fields always a fixed number of
characters? If they aren't then is there some other method for
determining how many characters to group into a field? From the looks
of it you could split that string any way you want and get something
that looks right, but isn't.
Matt
Is each data element a fixed size?
It looks like a fixed format data file, also called flat-record
files, in which fields are of fixed lengths, and records are
separated by newlines.
> What is the best way to go about this?
Check out chapter 2 of _Text Processing in Python_ for one
solution.
http://gnosis.cx/TPiP/chap2.txt
--
Neil Cerutti
Weight Watchers will meet at 7 p.m. Please use large double door at the side
entrance. --Church Bulletin Blooper
I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields. I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition. (OP may adjust to
suit.)
-- Paul
data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""
from pyparsing import *
import time
def convertTimeStamp(t):
t["date"] = map(int,t.date)
t["time"] = map(int,t.time)
return time.strftime("%Y-%m-%dT%H:%M",
tuple(t.date)+tuple(t.time)+(0,0,0,0))
yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)
fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
fieldF + fieldG + fieldH + fieldI + fieldJ
res = dataFields.parseString(data)
print res.dump()
prints:
['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53