I have a large result set which I'm streaming to CSV files. I know I could use #sql to get the SQL, and run that through psql and output plain old CSV files. I'm trying to avoid that, one of the reason being at the point where I know the query, I have no access to the username / password.
Relevant parts are:
io = File.open("output.csv", "w")
DB = Sequel.connect("postgres://localhost/db")
ds = DB[:large_table].filter(...) # millions of rows are returned
CSV(io, :headers => false, :col_sep => ",") do |csv|
ds.each do |row|
values = [] # some code to do transformations
csv << values
end
end
I naively assumed #each didn't load the whole result set in memory. Is there a way to get 'streaming' results? I'm using PostgreSQL 9.1 from Ruby 1.9.2 and Sequel 3.39.0, sequel_pg 1.6.0 and pg 0.14.1.
> I have a large result set which I'm streaming to CSV files. I know I could
> use #sql to get the SQL, and run that through psql and output plain old CSV
> files. I'm trying to avoid that, one of the reason being at the point where
> I know the query, I have no access to the username / password.
I'm not sure you want to go down this road - but you can invoke the
PostgreSQL copy command from the raw driver and have it return the CSV data
to you directly that you could push out to a file.
The relevant pieces of code are scattered inside the "psql" sample for the
ruby-pg driver (search for handlecopyout).
Basically, you send out the copy command to stdout via the driver and then
keep calling .getline on the connection until it finishes. It's actually
pretty straight forward once you get the pieces together - I don't have a
sample on hand so I'm sorry that the description is a little fuzzy at best
:)
On Monday, October 22, 2012 12:28:35 PM UTC-7, François Beausoleil wrote:
> Hi!
> I have a large result set which I'm streaming to CSV files. I know I could > use #sql to get the SQL, and run that through psql and output plain old CSV > files. I'm trying to avoid that, one of the reason being at the point where > I know the query, I have no access to the username / password.
> Relevant parts are:
> io = File.open("output.csv", "w") > DB = Sequel.connect("postgres://localhost/db") > ds = DB[:large_table].filter(...) # millions of rows are returned > CSV(io, :headers => false, :col_sep => ",") do |csv| > ds.each do |row| > values = [] # some code to do transformations > csv << values > end > end
> I naively assumed #each didn't load the whole result set in memory. Is > there a way to get 'streaming' results? I'm using PostgreSQL 9.1 from Ruby > 1.9.2 and Sequel 3.39.0, sequel_pg 1.6.0 and pg 0.14.1.
Sequel 3.28+ supports Database#copy_table using the postgres adapter with the pg driver, which can be used to create CSV files from datasets very quickly, and it does stream results from the server.
> On Monday, October 22, 2012 12:28:35 PM UTC-7, François Beausoleil wrote:
> Hi!
> I have a large result set which I'm streaming to CSV files. I know I could use #sql to get the SQL, and run that through psql and output plain old CSV files. I'm trying to avoid that, one of the reason being at the point where I know the query, I have no access to the username / password.
> Relevant parts are:
> io = File.open("output.csv", "w") > DB = Sequel.connect("postgres://localhost/db") > ds = DB[:large_table].filter(...) # millions of rows are returned > CSV(io, :headers => false, :col_sep => ",") do |csv| > ds.each do |row| > values = [] # some code to do transformations > csv << values > end > end
> I naively assumed #each didn't load the whole result set in memory. Is there a way to get 'streaming' results? I'm using PostgreSQL 9.1 from Ruby 1.9.2 and Sequel 3.39.0, sequel_pg 1.6.0 and pg 0.14.1.
> Sequel 3.28+ supports Database#copy_table using the postgres adapter with the pg driver, which can be used to create CSV files from datasets very quickly, and it does stream results from the server.
Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.
On Monday, October 22, 2012 8:06:04 PM UTC-7, François Beausoleil wrote:
> Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically > need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM > ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.
Sorry about that, you are right that it is Database#copy_table, but you can pass a dataset to it, so you don't need to shell out to psql if you don't want to.
> On Monday, October 22, 2012 8:06:04 PM UTC-7, François Beausoleil wrote:
> Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.
> Sorry about that, you are right that it is Database#copy_table, but you can pass a dataset to it, so you don't need to shell out to psql if you don't want to.
Oh, interesting! Didn't know about that, since it doesn't appear on http://sequel.rubyforge.org/rdoc/classes/Sequel/Database.html. Is there a way to make the docs appear for it? I see the method's definition is guarded with SEQUEL_POSTGRES_USES_PG, which may not be set when the docs are generated.
I opened the gem and saw the implementation. This is perfect, as it's exactly what I need.
On Tuesday, October 23, 2012 1:19:58 PM UTC-7, François Beausoleil wrote:
> Le 2012-10-23 à 12:11, Jeremy Evans a écrit :
> On Monday, October 22, 2012 8:06:04 PM UTC-7, François Beausoleil wrote:
>> Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically >> need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM >> ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.
> Sorry about that, you are right that it is Database#copy_table, but you > can pass a dataset to it, so you don't need to shell out to psql if you > don't want to.
> Oh, interesting! Didn't know about that, since it doesn't appear on > http://sequel.rubyforge.org/rdoc/classes/Sequel/Database.html. Is there a > way to make the docs appear for it? I see the method's definition is > guarded with SEQUEL_POSTGRES_USES_PG, which may not be set when the docs > are generated.
AFAIK, RDoc doesn't actually use if statements to see if a method definition should be documented, since it doesn't actually execute ruby code (just attempts to parse it).