$ time -p ruby -e '
require "sqlite3"
db = SQLite3::Database.new("sample.db")
select_sql = <<-eof
select samp2
from sample
order by samp2
;
eof
db.execute(select_sql) do |row|
puts row[0] # surrogate for real processing here
end
' | wc -l
1000000
real 175.21
user 106.20
sys 13.47
But for some reason the sort takes a very long time in sqlite3.
Piping the data from sqlite3 into the sort command is much faster
(~20x):
$ time -p { grep -v ^# <<eof
select samp2
from sample
;
eof
} | sqlite3 sample.db |
LC_ALL=C sort -k1,1n |
ruby -e '
STDIN.each do |row|
puts row # surrogate for real processing here
end
' | wc -l
1000000
real 7.55
user 7.20
sys 0.56
My question is, what would be a "clean" way of sending a query string
to sqlite3, piping the output through sort, and then reading it into
ruby, all within ruby? Is io.popen the way to go?
BTW, below is a script that creates the sqlite3 sample.db with 1 MM records.
Regards,
- Robert
---- script ---
#!/bin/bash
echo "=== load data"
time -p echo {1..1000000}$'\t'${RANDOM} |
tr ' ' '\n' |
sqlite3 -init <(echo '
CREATE TABLE sample ( samp1 INTEGER, samp2 INTEGER)
;
.mod tabs
.imp "/dev/stdin" "sample"
select count(*) from sample
;
') sample.db .quit
Here's my stab at it using IO.popen. It works and works fast. Any
suggestions on how to make it more ruby-ish?
Regards,
- Robert
-----
$ time -p ruby -e '
require "sqlite3"
dbfile = "sample.db"
db = SQLite3::Database.new(dbfile)
sql = <<-eof.gsub(/^\s+/,"")
.mod tabs
select samp2
from sample
;
eof
command = <<-eof.gsub(/^\s+/,"")
echo "#{sql}" |
sqlite3 #{dbfile} |
LC_ALL=C sort -k1,1n
eof
IO.popen(command, "r") do |pipe|
pipe.each do |line|
puts line
end
end
' | wc -l
1000000
real 7.69
user 7.58
sys 0.49