[Proposal] Implement #sample on ActiveRecord::Relation to get a random record via database, not ruby

108 прегледа
Пређи на прву непрочитану поруку

Vitaly Shvedchenko

непрочитано,
25. 7. 2016. 23:38:4325.7.16.
– Ruby on Rails: Core
Hi, fellows!

Sometimes we need to get a random record from some table. The common way to get it is to use Array#sample method on a relation. Or, if your table is quiet large, you can use one of the ways your DB provides to do such thing. It could be 
MyModel.order('RAND()').limit(1)
if you use MySQL, or
MyModel.order('RANDOM()').limit(1)
if you use Postgresql.

This way is much more fast than using #sample method, even if your table is less then 1000 rows. But it has two downsides. 

First, this code is now database-dependent. So if your code will some day migrate on an another DB, you will have to change it. 
Second, it is not obvious for some rails novices, that doing `MyModel.all.sample` you actually use Array#sample method, that loads all relation collection in memory. If rails tend to be easy-to-start and clear for developer, it should not act this way. 

So, my suggestion is to implement #sample method somewhere in ActiveRecord so that it will generate database query to get random row. There are functions to do it in all common DB distributions (MySQL, Postgresql, MS SQL, IBM DB2, Oracle). The usage could look like this:

MyModel.sample # returns one random record
MyModel.sample(10) # returns 10 random records
MyModel.all.sample # works identical to the first two examples; and this is not the Array#sample method.
MyModel.some_scope.sample # returns one random record from a scope
and so on.

I think the performance benefit from such enhancement is quite obvious, but anyway, here is a benchmark for a Postgres table with 180 000 rows. 
2.1.5 (main):0 > Benchmark.bm do |x|
2.1.5 (main):0 *   x.report('sample') { LessonVersion.all.sample }
2.1.5 (main):0 *   x.report('RANDOM()') { LessonVersion.order('RANDOM()').limit(1) }
2.1.5 (main):0 * end
          user       system     total      real
sample     9.810000         8.690000        18.500000       ( 29.282437)
RANDOM()  0.000000   0.000000   0.000000   (0.000187)
=> [
  [0] #<Benchmark::Tms:0x007fe7c699c568 @label="sample", @real=29.282437, @cstime=0.0, @cutime=0.0, @stime=8.69, @utime=9.81, @total=18.5>,
  [1] #<Benchmark::Tms:0x007fe7c69a4420 @label="RANDOM()", @real=0.000187, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>
]

I've already set up rails-dev-box on my laptop and cloned rails repository, but there is a CONTRIBUTING.md file, that recommends to get positive feedback before writing code. So, here is my proposal. I think I can write it myself, but ofcourse implementation suggestions are very welcome. Especially since I'm not very comfortible with ActiveRecord sources yet. 

James Coleman

непрочитано,
26. 7. 2016. 09:06:0826.7.16.
– rubyonra...@googlegroups.com
`ORDER BY random()` is actually not very performant at the DB level unless you have a small number of rows. This is because the database (at the very least Postgres) must implement this with a full table scan, since there is no other way to determine what all physical rows on disk are visible to the query.

This is such a big problem that the SQL standard itself provides a special way (rather than ordering by random) to sample tables. Postgres added support for some of these features to 9.5. See http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5-2/

I'm not personally a fan of trying to abstract every database feature into AR. I don't think it's at all plausible that any decent-sized application that's doing anything interesting in the database will really be able to use a drop-in-replacement database kind of mentality.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
Visit this group at https://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

Vitaly Shvedchenko

непрочитано,
26. 7. 2016. 10:07:1126.7.16.
– Ruby on Rails: Core
> `ORDER BY random()` is actually not very performant at the DB level

It is, but it is definitely much more performant than load a whole query result into a ruby array.

Vitaly Shvedchenko

непрочитано,
5. 8. 2016. 09:00:225.8.16.
– Ruby on Rails: Core
Dear sirs contributors, 

Was the answer "no, you shouldn't write this code", or I should wait a little for somebody with more than 8 commits to answer my proposal?
Одговори свима
Одговори аутору
Проследи
0 нових порука