Export from Hive to Redshift

460 views
Skip to first unread message

dan.w...@gmail.com

unread,
Jul 20, 2015, 3:49:39 PM7/20/15
to luigi...@googlegroups.com
Hi,

I'm trying to take the result set from a Hive query run on an Amazon EMR cluster and load it into Redshift. I think the best route would be to export the result set to s3 as a single tsv file by running a query against Hive and setting the output as an S3Target. That looks something like:

class CommercialNodeQueryTask(hive.HiveQueryTask):

date = luigi.DateParameter(default = date.today() - timedelta(1))
client = S3Client(aws_access_key_id = configuration.get_config().get('aws','aws_access_key_id'), aws_secret_access_key = configuration.get_config().get('aws', 'aws_secret_access_key'))

def output(self):
return S3Target('s3://.../../../%s/%s/%s.tsv.gz' % (self.date
.strftime('%Y'), self.date
.strftime('%m'), self.date
.strftime('%Y-%m-%d')), client=self.client, format=format.Gzip)

def query(self):
return "select ... ; "

... but, while the root directory is created in s3, nothing gets written to the output location, and no error is generated. If I run the same HQL from the command line and add shell redirection, I get a nice tab separated file.

Anyone have a good strategy for making the results of a Hive query available to a redshift COPY load?

I'm using Luigi 1.3.0.

Thanks for your help!

Dan

Doug Daniels

unread,
Jul 22, 2015, 10:25:26 AM7/22/15
to dan.w...@gmail.com, luigi...@googlegroups.com
I haven't used the HiveQueryTask, so I can't help with that. One tip though for Redshift though is that you'll want to split your output into multiple files to let COPY run in parallel, particularly if you have a large data set. See the Redshift doc for more info on the proper number of files to produce for your cluster.


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages