df1 = pd.read_csv(r'c:\temp\input1.csv',sep=';')
print df1
print df1.dtypes
print '-----------'
f = lambda x : x.replace(",",".")
print f('12,55')
print '-----------'
converter = {'Number1':f,'Number2':f, 'Number3':f}
df2 = pd.read_csv(r'c:\temp\input1.csv',sep=';',converters=converter)
print df2
print df2.dtypes
hi Luc,
using f = lambda x : float(x.replace(",", ".")) worked for me with the
development version of pandas but breaks in 0.6.1, so this is
obviously a bug that I've fixed since then:
In [6]: df2
Out[6]:
Id Number1 Number2 Text1 Text2 Number3
0 1 1521 1.871e+05 ABC poi 4.739
1 2 121.1 1.49e+04 DEF uyt 0.3773
2 3 878.2 1.08e+05 GHI rez 2.736
In [9]: df2.dtypes
Out[9]:
Id int64
Number1 float64
Number2 float64
Text1 object
Text2 object
Number3 float64
It's certainly safe to install and use the version from GitHub.
Otherwise you can hang on a few more days for the official 0.7.0
release.
The exception you show when returning a string still happens in the
current git version, though. Creating an issue for it and will fix it
imminently:
https://github.com/wesm/pandas/issues/583
thanks,
Wes
OK I fixed the issue:
https://github.com/wesm/pandas/commit/cd4636bb6b69dde447b2529b8d940f459a9c598e
After the fix using
f = lambda x : x.replace(",", ".")
works and the result DataFrame has floating point columns for the Number* ones.
Thanks for reporting and the test case (often with fixing bugs the
hardest part is writing the test case).
- Wes