First of all, if your records aren't too big/wide, and you've got a reasonable amount of memory, 2 million rows is easily doable in Refine and then you can just subset using a facet on rowIndex modulo your sampling factor. I was working on a narrow (5 columns?) data set of 3.3 million rows in Refine the other day with no problem at all.
If you really need to sample on import, we don't have that currently. Reservoir sampling (as well as just straight probability sampling) is something that's been on my mind to add, but, although it's straightforward, there are many things ahead of it on the priority list. I've heard that one of the Yahoo researchers has done work in this space, but they haven't yet contributed it to the community.
Fortunately, the Unix (or Cygwin) command line can do this trivially with a command like those referenced in
this StackOverflow answer. The nice thing about the command line is that you can also use the
cut command to only include the columns of interest, cutting down the amount of data you need to deal with even more.
Tom