Hi,
i was wondering if/how is possible to apply a converter function on more than one column, just like "parse_dates" does.
i've a dataset of hundred thousands rows that looks like :
https://gist.github.com/anonymous/7714935i'm read it using :
###
import pandas as pd
import datetime as dt
filename = 'https://gist.github.com/anonymous/7714935/raw/19e1a3810b564616b2990651e1e91d11c185b9a4/data.dat'
def parse(UTCDate,UTCTime):
return dt.datetime.strptime(UTCDate, '%m/%d/%Y') + dt.timedelta(seconds=float(UTCTime))
names=['Remote','Master','AntHgt','UTCDate','UTCTime','Week','Date','GPSTime','LatD','LatM','LatS','LonD','LonM','LonS','H-Ell','H-MS',
'Easting','Northing','X-LL','Y-LL','Z-LL','SDEast','SDNorth','SDHeight','StdDev','HgtDiff','H-Sep ','AmbDrift',' Q','PDOP',
'NS','L1Rms','SolType','VNorth','VEast','VUp','AccNrth','AccEast','AccUp','Easting','Northing','Grid-Z']
df = pd.read_csv(filename, delimiter=r'\s+', skiprows=25, index_col='date', parse_dates={'date':['UTCDate','UTCTime']}, names=names, date_parser=parse)
###
unfortunately this code it is splitting the "Longitude" and "Latitude" columns in 6 columns :
'LatD','LatM','LatS','LonD','LonM','LonS'
so i cannot use my converter dms2deg1 :
###
def dms2deg1(latitude):
# latitude = '54 35 56.46'
deg, min, sec = latitude.split()
sgn = float(deg) / abs(float(deg))
return sgn * (abs(float(deg)) + (float(min) + (float(sec)/60) ) / 60 )
###
using the previouse read_csv command i should try something like :
###
def dms2deg(deg,min,sec):
sgn = float(deg) / abs(float(deg))
return sgn * (abs(float(deg)) + (float(min) + (float(sec)/60) ) / 60 )
###
but this takes multiple coumns as input ... and i do not know if/how can be done in pandas.
i was also trying to avoid to pass manually the names list to readcsv,
the file has a header on line 24, i was thinking to genarate the names list with :
###
import linecache
headerline = linecache.getline(r, 24)
header = " ".join(header.split()).split(' ')
###
but this will not work with the previous code because "Latitude" and "Longitude" are no more columns ... instead i have :
'LatD','LatM','LatS','LonD','LonM','LonS'
i posted also on Stack Overflow :
http://stackoverflow.com/questions/20296430/read-csv-in-pandas-how-to-use-a-specific-row-as-header
thanks a lot for any help!
Massimo.