better to pre-split large csv files prior to calling drRead.csv

5 views
Skip to first unread message

jeremiah rounds

unread,
Oct 15, 2015, 5:40:54 PM10/15/15
to Tessera-Users
Hi,

Is it best to split very large csv filesprior to calling drRead.csv (millions of lines)?   drRead.csv worked but I have noticed the very next MapReduce has a single Map that takes a very long time.  I am back-tracking on it and wondering if the best thing to do is to use fread to chunk up a large csv prior to putting it on the HDFS and prior to dfRead.csv?


I don't want to write the script unless it will help though.


Reply all
Reply to author
Forward
0 new messages