efficiency in processing a csv...

12 views
Skip to first unread message

Mark Locklear

unread,
Feb 28, 2014, 2:27:45 PM2/28/14
to ashevill...@googlegroups.com, ashevi...@googlegroups.com
Hey folks! I have a question regarding the best way to process a csv. I have rails app that will process a CSV. Imagine something like...

person_id, name, start_date
1111, busta, 1/1/14
1111, busta, 1/4/14
1111, busta, 1/7/14
2222, mista, 1/3/14
2222, mista, 1/1/14
2222, mista, 1/11/14


...and I am building arrays of students, so busta might look something like...

    1111 => ["1/1/14", "1/4/14", "1/7/14"]

...and I am going to need to find the 'earliest start date'...ect. The CSV's will have in the neighbourhood of about 10K rows. I'm just wondering would it be more efficient to 'process' a student at a time. That is to say...I build up an array of a single student, then I have that off to my 'process' method that finds the earliest start date. Or should I build up the big honk'n array of all my students, then pass that to my process method. I have a few other date fields I will have to do 'stuff' to. Just wondering what peoples thoughts are.

Just for eye candy I'm attaching a white board of the app.
--
J. Mark Locklear
locklear.me
-Philippians 4:13 gives you the muscle, but YOU have to flex it!

-He who works with his hands is a laborer.
-She who works with her hands and her head is a craftswomen.
-Those who work with their hands, their head, and their hearts are artists.
t4app.jpg

Andy Vanasse

unread,
Feb 28, 2014, 3:15:30 PM2/28/14
to Asheville Ruby Users Group, ashevill...@googlegroups.com
I don't think that 10K rows is going to be a big challenge for the CSV class (or FasterCSV if you're on ruby prior to 2.0).

With my limited knowledge of what's going on, I'd be inclined to build an array with keys for the student ids and values that are arrays.  The array would contain hashes that represent the rest of the values of each line.  I'd come to this arrangement with the assumption that the additional values are fairly strongly related.

I'd also plan on converting the date fields (and ints) when you import.



students = {
  1111 => 'busta'
}

grades = {
   1111 => [
     { last_attended: 2014-01-15, grade: 'F', end_date: 2014-05-01 },
     { last_attended: 2014-01-01, grade: 'F', end_date: 2014-05-01 },
     { last_attended: 2014-01-08, grade: 'U', end_date: 2014-05-01 }
   ]
}

grades.each do |student_id, grade_list|
  last_attended = grade_list.min do |a, b| 
    a.values_at(:last_attended) <=> b.values_at(:last_attended)
  end
  ...
end



--
You received this message because you are subscribed to the Google Groups "Asheville Ruby Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to asheville-rb...@googlegroups.com.
To post to this group, send email to ashevi...@googlegroups.com.
Visit this group at http://groups.google.com/group/asheville-rb.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages