RE: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

265 views
Skip to first unread message

Shriya Palsamudram

unread,
Jul 10, 2023, 1:24:25 PM7/10/23
to zhenghua wang, public

Hi,

 

Currently, access is provided upon request so thank you for reaching out. I just gave you read access to the google storage bucket: mlperf-llm-public2

We are also in the process of making the data public, that should be available very soon and we will update the README to reflect the same.

 

 

Thank you,

Shriya

From: pub...@mlcommons.org <pub...@mlcommons.org> On Behalf Of zhenghua wang
Sent: Friday, July 7, 2023 4:10 AM
To: public <pub...@mlcommons.org>
Subject: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

 

External email: Use caution opening links or attachments

 

I am trying to follow https://github.com/mlcommons/training/blob/master/large_language_model/megatron-lm/README.md#data-download to download data on gs://mlperf-llm-public2 as following:

gsutil cp -r gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json .

 

It fails with error message as following:

"AccessDeniedException: 403 zhenghua...@gmail.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."

 

Could anyone give any suggestion on how to download gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json  ?

 

Thanks a lot

--
You received this message because you are subscribed to the Google Groups "public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public+un...@mlcommons.org.
To view this discussion on the web visit https://groups.google.com/a/mlcommons.org/d/msgid/public/c6ea4f12-51ee-4c3d-81a9-7824227d18b1n%40mlcommons.org.

baodong wu

unread,
Jul 11, 2023, 2:42:46 PM7/11/23
to public, Shriya Palsamudram

Hi, 

I also got the same error,  Could you please autorize me? My gcp account is wub...@gmail.com.

Thank you very much!

Best Regards

Uday Kurkure

unread,
Jul 12, 2023, 12:17:12 PM7/12/23
to Shriya Palsamudram, pub...@mlcommons.org

Shriya,

 

I did not get any response to my two previous emails.

 

Can you please give me an access to

https://github.com/mlcommons/training/blob/master/large_language_model/megatron-lm/README.md#data-download?

 

Thanks,

 

-Uday

 

 

From: Uday Kurkure <ukur...@vmware.com>
Date: Tuesday, July 11, 2023 at 10:39 AM
To: Shriya Palsamudram <spalsa...@nvidia.com>
Subject: Re: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

Shriya,

 

I did not get any response my previous email. Can you please give me access to download the data ? 

Thank you.

-Uday

 

 

From: Uday Kurkure <ukur...@vmware.com>
Date: Monday, July 10, 2023 at 11:26 AM
To: Shriya Palsamudram <spalsa...@nvidia.com>
Subject: Re: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

Shriya,

 

Can you please give me (ukur...@vmware.com) the access?

 

From: 'Shriya Palsamudram' via public <pub...@mlcommons.org>
Date: Monday, July 10, 2023 at 10:24 AM
To: zhenghua wang <zhenghua...@gmail.com>
Cc: public <pub...@mlcommons.org>
Subject: RE: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

!! External Email

To view this discussion on the web visit https://groups.google.com/a/mlcommons.org/d/msgid/public/SJ0PR12MB6879EB9FAEB35EAF5F96C355B630A%40SJ0PR12MB6879.namprd12.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to community+...@mlcommons.org.
To view this discussion on the web visit https://groups.google.com/a/mlcommons.org/d/msgid/community/SJ0PR12MB6879EB9FAEB35EAF5F96C355B630A%40SJ0PR12MB6879.namprd12.prod.outlook.com.
For more options, visit https://groups.google.com/a/mlcommons.org/d/optout.

 

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

 

Rajesh Muthu

unread,
Jul 12, 2023, 12:55:34 PM7/12/23
to public
Hi,  could you please provide access to gs://mlperf-llm-public2 i'm looking forward to test C4  dataset on H100.

Please let me know if you need any additional detail 

account: rajes...@gmail.com

Thanks, Rajesh

kk Huang

unread,
Jul 14, 2023, 1:04:20 AM7/14/23
to public, Shriya Palsamudram, public, zhenghua wang
Hi Shriya:

I have the same problem, could you give me the read access?

Many thanks

Shriya Palsamudram 在 2023年7月11日 星期二凌晨1:24:25 [UTC+8] 的信中寫道:

ke sun

unread,
Jul 17, 2023, 2:37:00 PM7/17/23
to public, Shriya Palsamudram, public, zhenghua wang
Hi Shriya:

I have the same problem, could you give me the read access? My account is skcherryb...@gmail.com.
Thanks a lot
在2023年7月11日星期二 UTC+8 01:24:25<Shriya Palsamudram> 写道:

Paul Delestrac

unread,
Jul 20, 2023, 2:14:50 PM7/20/23
to public, Shriya Palsamudram, public, zhenghua wang
Hello, 

I got the same error, can you give me authorization? 
My account is paul.de...@gmail.com

Regards,
Paul

On Monday, July 10, 2023 at 7:24:25 PM UTC+2 Shriya Palsamudram wrote:

Hi,

 

Currently, access is provided upon request so thank you for reaching out. I just gave you read access to the google storage bucket: mlperf-llm-public2

We are also in the process of making the data public, that should be available very soon and we will update the README to reflect the same.

 

 

Thank you,

Shriya

From: pub...@mlcommons.org <pub...@mlcommons.org> On Behalf Of zhenghua wang
Sent: Friday, July 7, 2023 4:10 AM
To: public <pub...@mlcommons.org>
Subject: AccessDeniedException to download data in gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json

 

External email: Use caution opening links or attachments

 

I am trying to follow https://github.com/mlcommons/training/blob/master/large_language_model/megatron-lm/README.md#data-download to download data on gs://mlperf-llm-public2 as following:

gsutil cp -r gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json .

 

It fails with error message as following:

"AccessDeniedException: 403 zhenghua...@gmail.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."

 

Could anyone give any suggestion on how to download gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json  ?

 

Thanks a lot

--
You received this message because you are subscribed to the Google Groups "public" group.

To unsubscribe from this group and stop receiving emails from it, send an email to public+unsubscribe@mlcommons.org.

Ritika Borkar

unread,
Jul 20, 2023, 2:20:57 PM7/20/23
to Paul Delestrac, public, Shriya Palsamudram, public, zhenghua wang

Hello,

 

Can you please give the instructions in this PR a try https://github.com/mlcommons/training/pull/674/files

 

We are in the process of updating the instructions to point to the public Seagate bucket which will host the llm dataset & checkpoints from now on.

 

Thanks,
Ritika

To unsubscribe from this group and stop receiving emails from it, send an email to public+un...@mlcommons.org.

--

You received this message because you are subscribed to the Google Groups "public" group.

To unsubscribe from this group and stop receiving emails from it, send an email to public+un...@mlcommons.org.
To view this discussion on the web visit https://groups.google.com/a/mlcommons.org/d/msgid/public/50e95acd-9601-4d33-8d91-d63bdd83c111n%40mlcommons.org.

Paul Delestrac

unread,
Jul 24, 2023, 10:13:44 AM7/24/23
to public, Ritika Borkar, Shriya Palsamudram, public, zhenghua wang, Paul Delestrac
Worked out for me! Thanks!

I am downloading the processed dataset, currently at ~9MiB/s

yu yu

unread,
Jul 25, 2023, 4:42:56 AM7/25/23
to public, Paul Delestrac, Ritika Borkar, Shriya Palsamudram, public, zhenghua wang
hi,Paul and mlcommons. Do you have the file of gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model and gs://mlperf-llm-public2/gpt3_spmd1x64x24_tpuv4-3072_v84_20221101/checkpoints/checkpoint_00004000 can you give me a copy,thanks a lot. I am in a hurry work. My gmail is : hugoy...@gmail.com. Thanks a lot!
Reply all
Reply to author
Forward
0 new messages