You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Common Crawl
I am trying to get a listing of all the files in the commoncrawl-
crawl-002 bucket from EC2. I have configured s3cmd with my key and
secret and am trying to execute this:
s3cmd ls -r --add-header="x-amz-request-payer: requester" s3://
commoncrawl-crawl-002
Result:
ERROR: Access to bucket 'commoncrawl-crawl-002' was denied
My motivation in this is to find the name of a .gz file I can test
with BasicArcFileReaderSample.java
2012-05-25 00:17:47,335 ERROR
org.commoncrawl.samples.BasicArcFileReaderSample: java.io.IOException:
No input to process
at
org.commoncrawl.hadoop.io.ARCInputFormat.getSplits(ARCInputFormat.java:
171)
at
org.commoncrawl.samples.BasicArcFileReaderSample.main(BasicArcFileReaderSample.java:
64)
Mat Kelcey
unread,
May 24, 2012, 9:01:32 PM5/24/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
it's likely easier to use the version hosted under the amazon public datasets
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
That works, but I would like to view content from 2012 -- I only see 2009 and 2010 content here.
s3cmdo
unread,
May 25, 2012, 7:46:21 AM5/25/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
Nevermind, I see the blog-post. I think Readme.md on the github repo should be updated with this new info (where you give an example usage of BasicArcFileReaderSample).
Hsiao Su
unread,
May 25, 2012, 2:24:11 PM5/25/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
Which blog-post are you referring to?
s3cmdo
unread,
May 25, 2012, 5:29:04 PM5/25/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to common...@googlegroups.com
However, that won't help you run the example BasicArcFileReaderSample, because that code expects .gz's of ARC files. If you read the post, they have changed the format a bit and haven't updated their sample apps yet.
Hsiao Su
unread,
May 29, 2012, 7:53:04 PM5/29/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message