Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Streaming large result sets to disk?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
François Beausoleil  
View profile  
 More options Oct 22 2012, 3:28 pm
From: François Beausoleil <francois.beausol...@gmail.com>
Date: Mon, 22 Oct 2012 15:28:23 -0400
Local: Mon, Oct 22 2012 3:28 pm
Subject: Streaming large result sets to disk?
Hi!

I have a large result set which I'm streaming to CSV files. I know I could use #sql to get the SQL, and run that through psql and output plain old CSV files. I'm trying to avoid that, one of the reason being at the point where I know the query, I have no access to the username / password.

Relevant parts are:

io = File.open("output.csv", "w")
DB = Sequel.connect("postgres://localhost/db")
ds = DB[:large_table].filter(...) # millions of rows are returned
CSV(io, :headers => false, :col_sep => ",") do |csv|
  ds.each do |row|
    values = [] # some code to do transformations
    csv << values
  end
end

I naively assumed #each didn't load the whole result set in memory. Is there a way to get 'streaming' results? I'm using PostgreSQL 9.1 from Ruby 1.9.2 and Sequel 3.39.0, sequel_pg 1.6.0 and pg 0.14.1.

Thanks!
François


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John W Higgins  
View profile  
 More options Oct 22 2012, 4:55 pm
From: John W Higgins <wish...@gmail.com>
Date: Mon, 22 Oct 2012 13:55:36 -0700
Local: Mon, Oct 22 2012 4:55 pm
Subject: Re: Streaming large result sets to disk?

Afternoon,

On Mon, Oct 22, 2012 at 12:28 PM, François Beausoleil <

francois.beausol...@gmail.com> wrote:
> Hi!

> I have a large result set which I'm streaming to CSV files. I know I could
> use #sql to get the SQL, and run that through psql and output plain old CSV
> files. I'm trying to avoid that, one of the reason being at the point where
> I know the query, I have no access to the username / password.

I'm not sure you want to go down this road - but you can invoke the
PostgreSQL copy command from the raw driver and have it return the CSV data
to you directly that you could push out to a file.

The relevant pieces of code are scattered inside the "psql" sample for the
ruby-pg driver (search for handlecopyout).

Basically, you send out the copy command to stdout via the driver and then
keep calling .getline on the connection until it finishes. It's actually
pretty straight forward once you get the pieces together - I don't have a
sample on hand so I'm sorry that the description is a little fuzzy at best
:)

John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bharanee Rathna  
View profile  
 More options Oct 22 2012, 5:23 pm
From: Bharanee Rathna <deepfr...@gmail.com>
Date: Tue, 23 Oct 2012 08:22:39 +1100
Local: Mon, Oct 22 2012 5:22 pm
Subject: Re: Streaming large result sets to disk?

i'm the author of the swift-db-postgres driver

https://github.com/deepfryed/swift-db-postgres#data-io
http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html

if you're using the swift-db-postgres adapter, you should be able to stream
to files simply as shown in the example.

hope this helps

- b


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Evans  
View profile  
 More options Oct 22 2012, 5:37 pm
From: Jeremy Evans <jeremyeva...@gmail.com>
Date: Mon, 22 Oct 2012 14:37:52 -0700 (PDT)
Local: Mon, Oct 22 2012 5:37 pm
Subject: Re: Streaming large result sets to disk?

Sequel 3.28+ supports Database#copy_table using the postgres adapter with
the pg driver, which can be used to create CSV files from datasets very
quickly, and it does stream results from the server.

Thanks,
Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
François Beausoleil  
View profile  
 More options Oct 22 2012, 11:06 pm
From: François Beausoleil <francois.beausol...@gmail.com>
Date: Mon, 22 Oct 2012 23:05:33 -0400
Local: Mon, Oct 22 2012 11:05 pm
Subject: Re: Streaming large result sets to disk?

Le 2012-10-22 à 17:37, Jeremy Evans a écrit :

Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.

Thanks for pointers everyone!
François


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Evans  
View profile  
 More options Oct 23 2012, 12:11 pm
From: Jeremy Evans <jeremyeva...@gmail.com>
Date: Tue, 23 Oct 2012 09:11:42 -0700 (PDT)
Local: Tues, Oct 23 2012 12:11 pm
Subject: Re: Streaming large result sets to disk?

On Monday, October 22, 2012 8:06:04 PM UTC-7, François Beausoleil wrote:
> Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically
> need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM
> ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.

Sorry about that, you are right that it is Database#copy_table, but you can
pass a dataset to it, so you don't need to shell out to psql if you don't
want to.

Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
François Beausoleil  
View profile  
 More options Oct 23 2012, 4:19 pm
From: François Beausoleil <francois.beausol...@gmail.com>
Date: Tue, 23 Oct 2012 16:19:43 -0400
Local: Tues, Oct 23 2012 4:19 pm
Subject: Re: Streaming large result sets to disk?

Le 2012-10-23 à 12:11, Jeremy Evans a écrit :

> On Monday, October 22, 2012 8:06:04 PM UTC-7, François Beausoleil wrote:
> Hmm, this is Database#copy_table, not Dataset#copy_table. I specifically need a Dataset. I've gone down the system("psql -c 'COPY ( SELECT * FROM ... ) TO stdout' > /path/to/outfile ...") route, so I'm good for now.

> Sorry about that, you are right that it is Database#copy_table, but you can pass a dataset to it, so you don't need to shell out to psql if you don't want to.

Oh, interesting! Didn't know about that, since it doesn't appear on http://sequel.rubyforge.org/rdoc/classes/Sequel/Database.html. Is there a way to make the docs appear for it? I see the method's definition is guarded with SEQUEL_POSTGRES_USES_PG, which may not be set when the docs are generated.

I opened the gem and saw the implementation. This is perfect, as it's exactly what I need.

Thanks!
François


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Evans  
View profile  
 More options Oct 23 2012, 5:50 pm
From: Jeremy Evans <jeremyeva...@gmail.com>
Date: Tue, 23 Oct 2012 14:50:39 -0700 (PDT)
Local: Tues, Oct 23 2012 5:50 pm
Subject: Re: Streaming large result sets to disk?

It doesn't show up there, since the method is specific to the postgres
adapter.  It shows up at
http://sequel.rubyforge.org/rdoc-adapters/classes/Sequel/Postgres/Dat....  
This is similar to all of the other adapter-specific methods.

AFAIK, RDoc doesn't actually use if statements to see if a method
definition should be documented, since it doesn't actually execute ruby
code (just attempts to parse it).

Thanks,
Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »