DataFrames: the most efficient way to read strings with commas as floats

383 views
Skip to first unread message

Alexander Flyax

unread,
Feb 22, 2015, 7:37:56 PM2/22/15
to julia...@googlegroups.com
I have a csv that has a lot of numbers with commas in string format. E.g., "123,456". I want to read them in as floats. What is the most efficient way to do this? E.g., in Python I can do:

income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')

Is there an automatic equivalent in `DataFrames` of the `thousands` argument? If not, what's the most "julian" way of doing that? Thanks...

Milan Bouchet-Valat

unread,
Feb 23, 2015, 8:49:18 AM2/23/15
to julia...@googlegroups.com
You can simply use
readtable("income_2013_dollars.csv", separator='\t', decimal=',')

See:
http://dataframesjl.readthedocs.org/en/latest/io.html

(BTW, tab-delimited fields go against the pseudo-standard for a .csv
file... Else, readtable would have even guessed the arguments for you
based on the file extension.)

Regards

Eureka Zhu

unread,
Feb 23, 2015, 8:54:24 AM2/23/15
to julia...@googlegroups.com
How about using something like sed to prepocessing the csv file? 

Sent using CloudMagic

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pritam Prasad

unread,
Nov 16, 2015, 5:12:04 AM11/16/15
to julia-stats
Hi,
I tried reading with decimal=',' but its unsupported. Even though its there in the readtable manual. Can you please help me here:

Code:
data = readtable("dataFile.csv",separator=';',decimal=',')

Benjamin Deonovic

unread,
Nov 20, 2015, 9:40:07 AM11/20/15
to julia-stats
Don't use CSV this is such an insane file type. The fact that any programming language can figure out hwo to parse it is a miracle. Consider:

a,b,1,000,2,000,5,6,345

is that 9 columns with 4th and 6th being 000? Is it 7 columns with last column 345? is it 6 columns with last column as 6,345?

If for some reason you have to use CSV because you can't find any way to save your data in another format just give up on life and become a sailor or something.
Reply all
Reply to author
Forward
0 new messages