randomly sample from dataframe

579 views
Skip to first unread message

Arshak Navruzyan

unread,
Sep 20, 2014, 4:20:08 PM9/20/14
to julia...@googlegroups.com
I'd like to pull a 10% random sample from a dataframe with multiple columns (and maintain column to column relationships).  Looks like the statsbase sample method expects a 1D array.  Thanks! 


John Myles White

unread,
Sep 20, 2014, 4:23:23 PM9/20/14
to julia...@googlegroups.com
Here’s a 20% sample:

{
julia> using DataFrames

julia> using StatsBase

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
|-----|----|----|
| Row | A | B |
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |

julia> df[sample(1:size(df, 1), iceil(0.2 * size(df, 1))), :]
2x2 DataFrame
|-----|---|----|
| Row | A | B |
| 1 | 4 | 8 |
| 2 | 5 | 10 |
Reply all
Reply to author
Forward
0 new messages