convert non-delimited to delimited

RyanL

unread,

Aug 27, 2007, 1:59:23 PM8/27/07

to

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

Mark Elston

unread,

Aug 27, 2007, 2:35:17 PM8/27/07

to

Since you have to know, a priori, how to break the input string I
assume that these fields are of fixed length. You can use the following
to do what you want:

>>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
>>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %
(a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])

which results in the following output:

0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

Mark

mensa...@aol.com

unread,

Aug 27, 2007, 2:38:24 PM8/27/07

to

Or try this:

import struct
test = '0139725635999992000010100534+42050-102800FM-15+1198KAIA'
template = '4s6s5s4s2s2s2s2s1s6s7s5s5s4s'
the_line = struct.unpack(template,test)
print the_line
print ','.join(the_line)

## ('0139', '725635', '99999', '2000', '01', '01', '00', '53', '4',
'+42050', '-102800', 'FM-15', '+1198', 'KAIA')
##
##
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

Matimus

unread,

Aug 27, 2007, 2:40:06 PM8/27/07

to

I don't think you are going to find anything that will just do this
for you. You are going to have read the file, figure out where to
split the string, and reprint it delimited with commas. As for
suggesting code... I can't tell how you actually want to delimit the
stuff from the above example? Are the fields always a fixed number of
characters? If they aren't then is there some other method for
determining how many characters to group into a field? From the looks
of it you could split that string any way you want and get something
that looks right, but isn't.

Matt

Michael Bentley

unread,

Aug 27, 2007, 2:44:11 PM8/27/07

to python, RyanL

Is each data element a fixed size?

Neil Cerutti

unread,

Aug 27, 2007, 2:44:18 PM8/27/07

to

On 2007-08-27, RyanL <ryanla...@gmail.com> wrote:
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

It looks like a fixed format data file, also called flat-record
files, in which fields are of fixed lengths, and records are
separated by newlines.

> What is the best way to go about this?

Check out chapter 2 of _Text Processing in Python_ for one
solution.

http://gnosis.cx/TPiP/chap2.txt

--
Neil Cerutti
Weight Watchers will meet at 7 p.m. Please use large double door at the side
entrance. --Church Bulletin Blooper

Paul McGuire

unread,

Aug 27, 2007, 8:06:31 PM8/27/07

to

I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields. I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition. (OP may adjust to
suit.)

-- Paul

data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""

from pyparsing import *
import time
def convertTimeStamp(t):
t["date"] = map(int,t.date)
t["time"] = map(int,t.time)
return time.strftime("%Y-%m-%dT%H:%M",
tuple(t.date)+tuple(t.time)+(0,0,0,0))

yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)

fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
fieldF + fieldG + fieldH + fieldI + fieldJ

res = dataFields.parseString(data)
print res.dump()

prints:

['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53