better to pre-split large csv files prior to calling drRead.csv
5 views
Skip to first unread message
jeremiah rounds
unread,
Oct 15, 2015, 5:40:54 PM10/15/15
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Tessera-Users
Hi,
Is it best to split very large csv filesprior to calling drRead.csv (millions of lines)? drRead.csv worked but I have noticed the very next MapReduce has a single Map that takes a very long time. I am back-tracking on it and wondering if the best thing to do is to use fread to chunk up a large csv prior to putting it on the HDFS and prior to dfRead.csv?
I don't want to write the script unless it will help though.