Adding dataset from public aws s3

76 views
Skip to first unread message

Myles McManus - NOAA Affiliate

unread,
Nov 13, 2025, 9:31:48 PMNov 13
to ERDDAP
I am attempting to add a dataset via EDDGridFromNcFiles from a public AWS S3 bucket:  https://noaa-goes17.s3.amazonaws.com/index.html#ABI-L1b-RadC/

I have attached a minimal datasets.xml and I have tried many formats to try and get this working. This particular bucker is really only being used as a test, but I am able to add this data to ERDDAP locally using individual netCDF files downloaded from this bucket.

I am using the ERDDAP/ERDDAP docker container (erddap/erddap - Docker Image | Docker Hub) with version 2.29.

I have read the docs related to adding AWS S3 buckets to ERDDAP, but haven't had any luck in my implementation.

Within the attached datasets.xml file there are a number of variations of the datasets properties commented out as different test cases to try. These include various changes to the  cacheFromUrl, fileNameRegex, and pathRegex tags. Where these tests have been run, I have also included a comment of a snippet of the error output from `/erddatData/logs/logs.txt` when the dataset is attempted to be loaded. 

Currently in the datasets.xml, the test Case 12 is being used and uncommented. I'm hoping someone might be able steer me in the right direction as I'm out of ideas.


datasets.xml

Roy Mendelssohn - NOAA Federal

unread,
Nov 14, 2025, 10:16:15 AMNov 14
to Myles McManus - NOAA Affiliate, ERDDAP
I do not know as much about AWS as I should, but I will take a look, In the meantime, can you tell me if you started out by using GenerateDatasetsxml.sh as the first step for creating the xml?

Thanks,

-Roy
> --
> You received this message because you are subscribed to the Google Groups "ERDDAP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
> To view this discussion, visit https://groups.google.com/d/msgid/erddap/de831b0e-5b4a-446a-9d43-c9e38468eb9fn%40googlegroups.com.
> <datasets.xml>

Myles McManus - NOAA Affiliate

unread,
Nov 14, 2025, 2:56:25 PMNov 14
to Roy Mendelssohn - NOAA Federal, ERDDAP
Hey Roy, 
I did not use GenerateDatasets.xml, but I did verify the same dataset was working in ERDDAP with a local file to the system. It only does not work when trying to use the same dataset from AWS s3.


Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569

Roy Mendelssohn - NOAA Federal

unread,
Nov 14, 2025, 9:06:37 PMNov 14
to Myles McManus - NOAA Affiliate, ERDDAP
Hi Myles:

I started on this, but there is a lot you can do before I spend much time on this. Please do read again the info on using files in AWS S3 at:

https://erddap.github.io/docs/server-admin/datasets?_highlight=aws#working-with-aws-s3-files

There are several things that struck me right away, the first being:

> In all cases, you will need an AWS account because the AWS SDK for Java (which ERDDAP™ uses to retrieve information about the contents of a bucket) requires AWS account credentials. (more on this below)

So for starters have you supplied the credentials even though it is a public dataset? Also in that writeup it mentions that getting the URL correct can be tricky and it should be checked first by putting it in browser. So the first one I see is:

https://noaa-goes17.s3.amazonaws.com/ABI-L1b-RadC/2019/001/00/

Put that in a browser and you get exactly the message that ERDDAP™ displays, that is it has nothing to do with ERDDAP™. There is a section on the form of the URL required by ERDDAP, please play with those suggestions.

So the tl;dr version is you need to supply credentials and you need to test the URLs you give as described in the docs.

See if any of these things get you going and let me know. There are any number of people on this list who I know have ERDDAPs working with AWS and know more about it then I do, hopefully someone will chime in with what they use.

Thanks,

-Roy

PS - I am very please to see NCEI using ERDDAP™

PPS - Are you really using version 2.29? That has not been officially released, the latest official release is 2.28.1. and that is what you should be using for a non-test system. 2.29 will I believe have better handling of S3 (the code exists I dont; know which version it will appear in), Chis when he has time can give a little more detail.

Myles McManus - NOAA Affiliate

unread,
Nov 14, 2025, 9:28:35 PMNov 14
to Roy Mendelssohn - NOAA Federal, ERDDAP
Thanks for the assistance Roy. NCEI is using an older version of
ERDDAP in production. I just pulled a recent docker image for testing
s3 as we're trying to migrate all our systems to AWS including ERDDAP
and the datasets it serves.

As I'm re-reading the docs at
https://erddap.github.io/docs/server-admin/datasets?_highlight=aws#working-with-aws-s3-files

I can see that goes17 is an example used there, and I am just trying
to use the same example bucket. I do have AWS credentials setup with
in the docker container, and I was previously - before setting up the
credentials, getting errors associated with that.

I have tried a number of ways to set the <cacheFromUrl>, and you're
right that https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC/
returns an xml stating:

'''
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>ABI-L1b-RadC</Key>
<RequestId>5S94FQV0Q2ZZNBC1</RequestId>
<HostId>MxatNbBO3pZHv/0PxMBVLtQc58+AnIDp8+tLSDw01C2FjCCK3oYuTwhdIBNy+MrIrEoC8Da5bd0=</HostId>
</Error>
'''

However I have also tried
'<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/?prefix=ABI-L1b-RadC/</cacheFromUrl>'
as seen similarly in TestCase12 of my attached datasets.xml file, and
received the same error message:

'Reasons for failing to load datasets: No matching files at
https://noaa-goes17.s3.amazonaws.com/?prefix=ABI-L1b-RadC/'

and that page will return an xml containing many keys with values of
netCDF files.

All this to say, I think my request should be simplified to:

Can you please provide a <dataset> configuration that works for any
public bucket that I can use?

I very much appreciate you working on this and I hope that doing so
helps others too.

Thanks,
Myles McManus, P.E.
Data Scientist

Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email: myles,b,mcm...@noaa.gov | Voice/Text: (828) 419-1569

Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email: myles,b,mcm...@noaa.gov | Voice/Text: (828) 419-1569



On Fri, Nov 14, 2025 at 8:06 PM Roy Mendelssohn - NOAA Federal

Chris John - NOAA Affiliate

unread,
Nov 17, 2025, 2:42:59 PMNov 17
to Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), Roy Mendelssohn - NOAA Federal, ERDDAP
There are some improvements to the S3 integration thanks to @Champagne, Seth J CIV USN NRL DET SSC MS (USA) in 2.29.0 (currently alpha test version available on docker). Since the version 2.29 you have is a nightly/alpha you may want to check the exact version you are using. One thing to call out in the more recent builds of 2.29.0 is for completely public buckets, you should now be able to use anonymous credentials (which are used if you don't set any).

I'm currently working on improving some of our AWS S3 tests. If you just want to see a dataset using s3, I believe the dataset defined in EDDTestDataset.xmlFragment_testAwsS3 works (search for xmlFragment_testAwsS3 on that page). I will hopefully have another example soon using goes17.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi




Myles McManus - NOAA Affiliate

unread,
Nov 18, 2025, 1:19:58 PMNov 18
to Champagne, Seth J CIV USN NRL DET SSC MS (USA), ERDDAP, Chris John - NOAA Affiliate, Mendelssohn, Roy CIV (USA)
Hey all, First off thank you so much for the support, this is awesome feedback.

I have implemented Chris's suggested dataset from:
https://raw.githubusercontent.com/ERDDAP/erddap/refs/heads/main/src/test/java/testDataset/EDDTestDataset.java.


I have also implemented Seth's suggested:
<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/2019/045/00/</cacheFromUrl>


 The result is both datasets failing, but it's a different error than I was getting before, I just wasn't able to get past it though. The error in log.txt is:

n Datasets Failed To Load (in the last major LoadDatasets) = 2
    testAwsS3, noaa_goes17_abi_l1b_radc_table, (end)

Reasons for failing to load datasets:
testAwsS3: Cannot load from object array because "this.sourceAxisValues" is null
noaa_goes17_abi_l1b_radc_table: Cannot load from object array because "this.sourceAxisValues" is null

I have attached both the datasets.xml and the log.txt  and again really appreciate the assistance.

Thanks,
Myles McManus, P.E.
Data Scientist

Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)
NCEI Data Stewardship Division
151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles.b...@noaa.gov | Voice/Text:  (828) 419-1569

Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569



On Mon, Nov 17, 2025 at 2:37 PM Champagne, Seth J CIV USN NRL DET SSC MS (USA) <seth.j.cha...@us.navy.mil> wrote:

I was able to confirm locally that this worked in ERDDAP

<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/2019/045/00/</cacheFromUrl>

 

Thanks,

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

 

From: Champagne, Seth J CIV USN NRL DET SSC MS (USA)
Sent: Monday, November 17, 2025 2:27 PM
To: Chris John - NOAA Affiliate <chris...@noaa.gov>; McManus, Myles B CTR (USA) <myles.b...@noaa.gov>
Cc: Mendelssohn, Roy CIV (USA) <roy.men...@noaa.gov>
Subject: RE: [Non-DoD Source] Re: [ERDDAP] Adding dataset from public aws s3

 

I did not notice this thread was from the google groups, whoops.

 

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

 

From: Champagne, Seth J CIV USN NRL DET SSC MS (USA)
Sent: Monday, November 17, 2025 2:24 PM
To: 'Chris John - NOAA Affiliate' <chris...@noaa.gov>; McManus, Myles B CTR (USA) <myles.b...@noaa.gov>
Cc: Mendelssohn, Roy CIV (USA) <roy.men...@noaa.gov>; ERDDAP <erd...@googlegroups.com>
Subject: RE: [Non-DoD Source] Re: [ERDDAP] Adding dataset from public aws s3

 

Myles,

 

I think you can get this dataset working using ERDDAP with the prior release versions, you just have a mistake with your prefix configuration.

From what I can see from https://noaa-goes17.s3.us-east-1.amazonaws.com/

 

Your cacheFromUrl should be the following:

<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/</cacheFromUrl>

 

I think that the code will try to recurse through the bucket and cache the data. This process can take a while for ERDDAP to complete. ERDDAP was designed to download the netcdf files locally for the dataset code to access, rather than read from the file over-the-wire. S3 has intentional cost/performance tradeoffs that Bob Simmons did his best to work around.

 

Try that change and see if you have better luck. If you send me your dataset xml snippet, I can try to troubleshoot for you.

 

Thanks,

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

log.txt
datasets (1).xml

Roy Mendelssohn - NOAA Federal

unread,
Nov 18, 2025, 1:34:31 PMNov 18
to Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via, Chris John - NOAA Affiliate
Hi Myles:

If you are still using the Docker with version 2.29 can I suggest you use the Docker with 2.28.1, because as Chris said 2.29 is only at alpha level, This is just to make certain it is not a problem with 2.29.

I will try to take a look, hopefully Seth or Chris can chime in because they know about about AWS than I do,

-Roy
> <log.txt><datasets (1).xml>

Chris John - NOAA Affiliate

unread,
Nov 18, 2025, 2:27:50 PMNov 18
to Roy Mendelssohn - NOAA Federal, Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
I'm looking into the issue with sourceAxisValues. There may be a way to workaround it with some config changes, but that shouldn't be necessary. I'm hoping to have a proper fix soon.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Chris John - NOAA Affiliate

unread,
Nov 19, 2025, 10:30:58 AMNov 19
to Roy Mendelssohn - NOAA Federal, Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
I have a pull request that should resolve the sourceAxisValues issue. https://github.com/ERDDAP/erddap/pull/396

This also includes support for nc files with an axis variable (like time) that is constant per file (for example the goes17 data).

Once it is merged there will be a new alpha Docker build created automatically.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Chris John - NOAA Affiliate

unread,
Nov 24, 2025, 9:28:06 AMNov 24
to Roy Mendelssohn - NOAA Federal, Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Myles McManus - NOAA Affiliate

unread,
Nov 30, 2025, 4:18:17 PM (10 days ago) Nov 30
to Chris John - NOAA Affiliate, Roy Mendelssohn - NOAA Federal, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
Hey all, I have successfully added two datasets from aws s3 sources:


Thank you very much for this!

I have attached the datasets.xml for anyone who may want it.

For the GOES-17 ABIL1b-RadC data, I am using a cacheURL that is the deepest level where actual *.nc files are located. I would like to be able to have all the nc files available that use the bucket/key: "noaa-goes17/ABI-L1b-RadC-Reproc" rather than drilling all the way down to a single "directory" of nc files: "noaa-goes17/ABI-L1b-RadC-Reproc/2019/045/00/".

@Roy Mendelssohn - NOAA Federal has previously mentioned that there is an ERDDAP using GOES-17 ABI-L1b-RadC data here: ERDDAP - files/awsS3NoaaGoes17/ABI-L1b-RadC/

This ERDDAP dataset does have a recursive directory listing. So a couple questions about this particular dataset's configuration in ERDDAP:
1. Is this using AWS S3 data directly as I'm doing with the attached datasets.xml, or is it a mounted filesystem?
2. If it is using AWS S3 data directly, can I please get a copy of that datasets.xml snippet?
3. If it isn't using AWS S3 data directly, is it even possible, with the current ERDDAP implementation, to have this type of recursive file listings of the GOES17 data like that shown at: ERDDAP - files/awsS3NoaaGoes17/ABI-L1b-RadC/ without mounting the data?
4. If mounting the data from S3 is required, are you using s3fs-fuse? 


Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569


datasets (2).xml

Chris John - NOAA Affiliate

unread,
Dec 1, 2025, 11:21:01 AM (9 days ago) Dec 1
to Myles McManus - NOAA Affiliate, Roy Mendelssohn - NOAA Federal, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
awsS3NoaaGoes17 is using dataset type: EDDTableFromFileNames (with the ***fromOnTheFly type of filedir) which does not serve data from within the files, but serves the files themselves (you can read more at that link).

It is using the S3 data directly. I've attached the Java code used to generate the xml snippet for the dataset in testing (it might differ from the production dataset definition slightly).

I don't know the differences in how files are found for EDDTableFromFileNames vs EDDGridFromNcFIles using a cache url off the top of my head. It's possible a change in ERDDAP™ is needed to make EDDGridFromNcFiles "recursive"  (in quotes because directories are mostly an illusion for S3).

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



awsS3NoaaGoes17_JavaSnippet.txt

Roy Mendelssohn - NOAA Federal

unread,
Dec 1, 2025, 11:44:37 AM (9 days ago) Dec 1
to Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via, Chris John - NOAA Affiliate
What Chris sent is basically what we have for the essential parts of the xml:

<dataset type="EDDTableFromFileNames" datasetID="awsS3NoaaGoes17" active="true">
<fileDir>***fromOnTheFly, https://noaa-goes17.s3.us-east-1.amazonaws.com/</fileDir>
<fileNameRegex>.*</fileNameRegex>
<recursive>true</recursive>
<pathRegex>.*</pathRegex>
<reloadEveryNMinutes>2880</reloadEveryNMinutes>

HTH,

-Roy
> <awsS3NoaaGoes17_JavaSnippet.txt>

Myles McManus - NOAA Affiliate

unread,
Dec 1, 2025, 11:53:15 AM (9 days ago) Dec 1
to Roy Mendelssohn - NOAA Federal, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via, Chris John - NOAA Affiliate
Thank you guys very much, I will test this EDDTableFromFileNames snippet for recursive listings. However, I definitely would like to promote this recursive listing functionality within EDDGridFromNcFiles because of all the cf-compliant gridded data that we would like to maintain dimensional subsetting capabilities with as we migrate to the cloud. Is this something I can promote for here, or would it be better served to open a Github issue? I'm happy to do so if it helps.

I truly appreciate the guidance on this, and I've learned a lot about ERDDAP in the process.

Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569


Chris John - NOAA Affiliate

unread,
Dec 1, 2025, 12:53:37 PM (9 days ago) Dec 1
to Myles McManus - NOAA Affiliate, Roy Mendelssohn - NOAA Federal, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
I'd recommend filing an issue on GitHub. It's best if you create the issue so that you will be notified if there are questions and/or to be notified when the work is done.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Myles McManus - NOAA Affiliate

unread,
Dec 3, 2025, 5:25:07 PM (7 days ago) Dec 3
to ERDDAP
Reply all
Reply to author
Forward
0 new messages