Redshift Configuration

459 views
Skip to first unread message

Joao Correia

unread,
May 21, 2014, 12:44:39 AM5/21/14
to snowpl...@googlegroups.com

Hi Snowplowers,

When using the storageloader (not superuser) I get an error and the job is not completed by lack of ANALYZE storageloader permissions.

root@ip-xxx:/snowplow/4-storage/storage-loader# bundle exec bin/snowplow-storage-loader -c config/redshift.yml

Loading Snowplow events into Redshift (Redshift cluster)...
INFO:  Load into table 'events' completed, 76541 record(s) loaded successfully.
PG::Error error executing ANALYZE atomic.events;: ERROR:  skipping "events" --- only table or database owner can analyze it
root@ip-xxxxxx:/snowplow/4-storage/storage-loader# 

So to get this working I got to use the power_user because ANALYZE is not on the privileges list.

Amazon Redshift supports the following privileges: SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, EXECUTE, and USAGE. Different privileges are associated with different object types. For information on database object privileges supported by Amazon Redshift, see the GRANT command.

Is there any fix for this?

Just one thing that got my head spinning a bit ... on redshift.yml the in: is the out of the emr-etl-runner ... of course! :P

Simon Rumble

unread,
May 21, 2014, 12:55:47 AM5/21/14
to snowpl...@googlegroups.com
Yeah I raised this a while back and it seems Yali has updated the documentation. See the note at the bottom of this section:

I'm not sure I understand the note though. It seems only the admin user can run ANALYZE and VACUUM, which are things you want to do reasonably often I thought. The note seems to indicate you don't need it after setting up and can use --skip analyze for production loads?


--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Simon Rumble <si...@simonrumble.com>

Joao Correia

unread,
May 21, 2014, 4:20:22 AM5/21/14
to snowpl...@googlegroups.com
Hi Simon, 

I cant seam to find that note on that page. 
The job seams to call ANALYZE every time the data is loaded. For the time being I just use power_user.

By the way do you know how to change the collector_timestamp?

Yali Sassoon

unread,
May 21, 2014, 5:34:01 AM5/21/14
to snowpl...@googlegroups.com
Hi Joao, Simon,

Apologies - I need to update the documentation - it's not at all clear.

To run the ANALYZE command you need to be logged in as the owner of the atomic.events table. So when you switch to using the `storageloader` user to load your data (rather than the admin / power user) you need to update the atomic.events table to make this user the owner:

ALTER TABLE atomic.events OWNER TO storageloader;

The StorageLoader will now be able to run the `ANALYZE` step.

Only an admin user can run the VACUUM or COMPUPDATES. Vacuuming a table is something you only need to do if you're deleted rows from the table. (It effectively reclaims that 'lost' disk space, so that disk scans over your data set are faster.) This is something that should only happen occasionally - in general you're only ever appending new data to atomic.events.

Similarly, COMPUPDATES is used to set the encoding for each column on the database, to optimize querying that column. This is something that's only done initially when the first data set is loaded.

So neither VACUUM or COMPUPDATES are done by default - to use them you need to explicitly include them when running the StorageLoader, as documented here. For that reason, you should be fine not running StorageLoader as an admin user.

Does that make sense? If so, I'll update the documentation accordingly...

Cheers,


Yali

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7841 954 117
@yalisassoon

Simon Rumble

unread,
May 21, 2014, 5:42:40 AM5/21/14
to snowpl...@googlegroups.com
On 21 May 2014 18:20, Joao Correia <joao.c...@gmail.com> wrote:
I cant seam to find that note on that page. 
The job seams to call ANALYZE every time the data is loaded. For the time being I just use power_user.

I think you caught Yali mid update!
 
By the way do you know how to change the collector_timestamp?

Add something like this to your query and let the database handle it. You don't have to deal with the complexities of timezones.

CONVERT_TIMEZONE('Australia/Sydney', "collector_tstamp") AS "collector_tstamp Australia/Sydney"
Reply all
Reply to author
Forward
0 new messages