PostgresQL. Could not open file. Permission denied.

4,596 views
Skip to first unread message

Ivan Klimovich

unread,
May 22, 2014, 5:12:00 AM5/22/14
to snowpl...@googlegroups.com
Hi snowplowers, It's me again.

I've setup PostgresQL on the AWS EC2, using AMI shown in your guide.

What I did:
1) I can connect to my db using other_user, power_user and storageloader with Navicat easily;
2) While trying to fix issue all of those users were granted SUPERUSER status.
3) The file with events was downloaded from the "In bucket" (which is out bucket for ETL) to /snowplowfiles/ folder on that instance. (I can see it using WinSCP)
4) My default ec2-user has only one group right now - "root". So I assume he has all necessary privileges on server.



When I start storage-loader:

 [ec2-user@IP storage-loader]$ bundle exec bin/snowplow-storage-loader --config config/postgres.yml -s download

I get: 

Loading Snowplow events into PostgresQL-EC2-instance (PostgreSQL database)...
PG::Error error executing BEGIN;
COPY atomic.events FROM '/home/ec2-user/snowplowfiles/run=2014-05-20-07-55-32/part-00000' WITH CSV ESCAPE E'\x02' QUOTE E'\x01' DELIMITER '     ' NULL '';
COMMIT;: ERROR:  could not open file "/home/ec2-user/snowplowfiles/run=2014-05-20-07-55-32/part-00000" for reading: Permission denied


What kind of permissions it requires? 

Thanks in advance.

Ivan

Yali Sassoon

unread,
May 22, 2014, 7:45:10 AM5/22/14
to snowpl...@googlegroups.com
Hi Ivan,

StorageLoader needs read permissions on the file. It looks like the root user (which you appear to be running the StorageLoader as) doesn't have those permissions, which is a bit surprising. Can you confirm if that is the case, by SSHing into the instance, and then executing:

ls -l  /home/ec2-user/snowplowfiles/run=2014-05-20-07-55-32/part-00000

Then reply with the output from the console please?

Also another thing to quickly confirm - are you running StorageLoader on the same box as you're running Postgres?

Thanks

Yali


--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7841 954 117
@yalisassoon

Ivan Klimovich

unread,
May 22, 2014, 8:01:23 AM5/22/14
to snowpl...@googlegroups.com
Hi Yali.

Here is what I get:

-rwxr-xr-x 1 ec2-user root 63514015 May 22 05:26 /home/ec2-user/snowplowfiles/run=2014-05-20-07-55-32/part-00000  

Yes. Storage loader is running on the same instance where postgresQL is installed.

Best,
Ivan

Alex Dean

unread,
May 22, 2014, 10:24:41 AM5/22/14
to snowpl...@googlegroups.com
Can you list the permissions of all the parent folders?

A

Ivan Klimovich

unread,
May 22, 2014, 10:38:15 AM5/22/14
to snowpl...@googlegroups.com
Hello, Alex

Here we go:
-------------------------------------------------------------------------------------------------------------------------------------
[ec2-user@ip-172-31-47-82 ~]$ ls -l /home/ec2-user/snowplowfiles/run=2014-05-20-07-55-32/
total 62028
-rwxr-xr-x 1 ec2-user root 63514015 May 22 05:26 part-00000
-------------------------------------------------------------------------------------------------------------------------------------
[ec2-user@ip-172-31-47-82 ~]$ ls -l /home/ec2-user/snowplowfiles/               
total 4                                                                         
drwxr-xr-x 2 ec2-user root 4096 May 22 05:26 run=2014-05-20-07-55-32
-------------------------------------------------------------------------------------------------------------------------------------
[ec2-user@ip-172-31-47-82 ~]$ ls -l /home/ec2-user/                             
total 16
-rw-r--r-- 1 ec2-user ec2-user 1696 May 21 13:06 
-rw-r--r-- 1 ec2-user ec2-user 1464 May 21 14:10 
drwxr-xr-x 8 ec2-user root     4096 May 22 05:13 snowplow
drwxr-xr-x 3 ec2-user root     4096 May 22 05:26 snowplowfiles
-------------------------------------------------------------------------------------------------------------------------------------
[ec2-user@ip-172-31-47-82 ~]$ ls -l /home/                                      
total 4
drwx------ 8 ec2-user ec2-user 4096 May 22 05:23 ec2-user

Best
Ivan




                                                                 


--
You received this message because you are subscribed to a topic in the Google Groups "Snowplow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snowplow-user/qGCpM_ShJbk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snowplow-use...@googlegroups.com.

Alex Dean

unread,
May 22, 2014, 10:42:32 AM5/22/14
to snowpl...@googlegroups.com
Ah - https://github.com/snowplow/snowplow/wiki/1-Installing-the-StorageLoader#download

"PostgreSQL's own postgres user must to be able to read every parent directory of the directory specified. This is necessary to ensure that PostgreSQL can read the data in the directory, when it comes to ingest it"

So create something like /opt/snowplow-data and make sure the whole tree is readable.

A

Ivan Klimovich

unread,
May 22, 2014, 11:12:04 AM5/22/14
to snowpl...@googlegroups.com
God. My instance became unaccessable.

OK. So I think I can give Read/Write properties by this command: chmod -R a+r /folder/name/here 
Will try ASAP.

Ivan Klimovich

unread,
May 23, 2014, 6:04:07 AM5/23/14
to snowpl...@googlegroups.com
If anyone sees this post  - giving "other users" rights to read ec2-user folder fixed the issue.

Thank you,
Alex!

Alex Dean

unread,
May 23, 2014, 6:05:11 AM5/23/14
to snowpl...@googlegroups.com
No worries, glad it worked!


--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alan D'Souza

unread,
Mar 18, 2015, 5:13:10 AM3/18/15
to snowpl...@googlegroups.com
Hi Ivan,

Im struggling with the same problem.  How did you give "other users" access?  Im using a ubuntu server so hoping your solution works for me as well.

Thanks,
Alan

ivan.klimovi...@gmail.com

unread,
Mar 18, 2015, 5:23:19 AM3/18/15
to snowpl...@googlegroups.com, snowpl...@googlegroups.com
Hi Alan.

I used WinSCP and just checked necessary boxes in folder properties.

Try this command : chmod -R a+rwx {/folder/name/here} this should do the work if you are using CLI

Best regards,
Ivan


Sent from Mailbox


You received this message because you are subscribed to a topic in the Google Groups "Snowplow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snowplow-user/qGCpM_ShJbk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snowplow-use...@googlegroups.com.

Alan D'Souza

unread,
Mar 21, 2015, 1:09:27 PM3/21/15
to snowpl...@googlegroups.com
Thanks Ivan.  I tried your solution but it doesn't seem to work for me.  I have storageloader running on the same machine as postgres.

Here's the error:
Loading Snowplow events into My PostgreSQL database (PostgreSQL database)...

PG
::Error error executing BEGIN;
 
COPY atomic
.events FROM '/tmp/snwplw/run=2015-03-17-07-00-58/part-00000' WITH CSV ESCAPE E'\x02' QUOTE E'\x01'DELIMITER ' ' NULL ''; 
COMMIT
;: ERROR: must be superuser to COPY to or from a file 
HINT
: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

I am trying to run the loader from:

/home/ubuntu/snowplow/4-storage/storage-loader

storing events in:

/tmp/snwplw

drwxrwxrwx 3 ubuntu ubuntu 4096 Mar 21 09:25 snwplw

drwxrwxrwt   3 root root  4096 Mar 21 09:25 tmp

Any help would be much appreciated. 

Thanks! - Alan

Robert

unread,
Mar 24, 2015, 3:07:19 PM3/24/15
to snowpl...@googlegroups.com
Im having a similar problem.  Files download no problem, and Im running into this:

PG::Error error executing BEGIN;

COPY atomic.events FROM '/tmp/snwplw/run=2015-03-24-06-28-11/part-00000' WITH CSV ESCAPE E'\x02' QUOTE E'\x01' DELIMITER ' ' NULL '';

COMMIT;: ERROR:  could not open file "/tmp/snwplw/run=2015-03-24-06-28-11/part-00000" for reading: No such file or directory


Any help would be appreciated!

Alex Dean

unread,
Mar 31, 2015, 2:20:19 PM3/31/15
to snowpl...@googlegroups.com
Hey Alan, Robert,

If you are still having problems, I would recommend using a Redshift storage target instead of Postgres. Almost everybody using Snowplow in production is using either Redshift or Elasticsearch (or both); Postgres doesn't scale well to the kinds of event volumes almost all users will generate.

We have a PR which should make Postgres loading less fiddly (and also support Amazon RDS), but it hasn't been a priority to test & merge yet:

https://github.com/snowplow/snowplow/pull/624/files

Cheers,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Miner

unread,
Mar 31, 2015, 3:10:04 PM3/31/15
to snowpl...@googlegroups.com
I ended up finding out this was a problem with the tmp directory, wrong permissions as stated before.  The entire tree must be read-write enabled for the user.

And yes, it took ALOT of fiddling around for this to work but I got it up and running.  I’ll take a look at the PR and see if I can test.  Thanks!
You received this message because you are subscribed to a topic in the Google Groups "Snowplow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snowplow-user/qGCpM_ShJbk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snowplow-use...@googlegroups.com.

Alex Dean

unread,
Mar 31, 2015, 3:14:05 PM3/31/15
to snowpl...@googlegroups.com
Aha - glad you got it working Robert!

Alex

Robert Kingston

unread,
Apr 2, 2015, 3:56:07 AM4/2/15
to snowpl...@googlegroups.com
@Alan, Robert, I have to agree with Alex's recommendation above. 

We were on Postgres for a while and ran into countless problems - particularly around scaling but also the permissions issue. We were reluctant to move to Redshift, but not only has it turned out to be cheaper, it's also much faster. 

A query which used to take 5-10 minutes under Postgres now takes <1 minute on Redshift. Also, loading data is much faster - rather than ~30 minute loads, we're looking at <5 minute loads.

Well worth taking a look at cost-effective dw2.large instance for small workloads like ours.

Alan D'Souza

unread,
Apr 4, 2015, 4:32:31 PM4/4/15
to snowpl...@googlegroups.com
Thanks for the update Robert.  I ended up switching to Redshift as well.  The setup was seamless and performance has been great. 
Reply all
Reply to author
Forward
0 new messages