Adding dataset from public aws s3

48 views
Skip to first unread message

Myles McManus - NOAA Affiliate

unread,
Nov 13, 2025, 9:31:48 PM (6 days ago) Nov 13
to ERDDAP
I am attempting to add a dataset via EDDGridFromNcFiles from a public AWS S3 bucket:  https://noaa-goes17.s3.amazonaws.com/index.html#ABI-L1b-RadC/

I have attached a minimal datasets.xml and I have tried many formats to try and get this working. This particular bucker is really only being used as a test, but I am able to add this data to ERDDAP locally using individual netCDF files downloaded from this bucket.

I am using the ERDDAP/ERDDAP docker container (erddap/erddap - Docker Image | Docker Hub) with version 2.29.

I have read the docs related to adding AWS S3 buckets to ERDDAP, but haven't had any luck in my implementation.

Within the attached datasets.xml file there are a number of variations of the datasets properties commented out as different test cases to try. These include various changes to the  cacheFromUrl, fileNameRegex, and pathRegex tags. Where these tests have been run, I have also included a comment of a snippet of the error output from `/erddatData/logs/logs.txt` when the dataset is attempted to be loaded. 

Currently in the datasets.xml, the test Case 12 is being used and uncommented. I'm hoping someone might be able steer me in the right direction as I'm out of ideas.


datasets.xml

Roy Mendelssohn - NOAA Federal

unread,
Nov 14, 2025, 10:16:15 AM (6 days ago) Nov 14
to Myles McManus - NOAA Affiliate, ERDDAP
I do not know as much about AWS as I should, but I will take a look, In the meantime, can you tell me if you started out by using GenerateDatasetsxml.sh as the first step for creating the xml?

Thanks,

-Roy
> --
> You received this message because you are subscribed to the Google Groups "ERDDAP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
> To view this discussion, visit https://groups.google.com/d/msgid/erddap/de831b0e-5b4a-446a-9d43-c9e38468eb9fn%40googlegroups.com.
> <datasets.xml>

Myles McManus - NOAA Affiliate

unread,
Nov 14, 2025, 2:56:25 PM (6 days ago) Nov 14
to Roy Mendelssohn - NOAA Federal, ERDDAP
Hey Roy, 
I did not use GenerateDatasets.xml, but I did verify the same dataset was working in ERDDAP with a local file to the system. It only does not work when trying to use the same dataset from AWS s3.


Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569

Roy Mendelssohn - NOAA Federal

unread,
Nov 14, 2025, 9:06:37 PM (5 days ago) Nov 14
to Myles McManus - NOAA Affiliate, ERDDAP
Hi Myles:

I started on this, but there is a lot you can do before I spend much time on this. Please do read again the info on using files in AWS S3 at:

https://erddap.github.io/docs/server-admin/datasets?_highlight=aws#working-with-aws-s3-files

There are several things that struck me right away, the first being:

> In all cases, you will need an AWS account because the AWS SDK for Java (which ERDDAP™ uses to retrieve information about the contents of a bucket) requires AWS account credentials. (more on this below)

So for starters have you supplied the credentials even though it is a public dataset? Also in that writeup it mentions that getting the URL correct can be tricky and it should be checked first by putting it in browser. So the first one I see is:

https://noaa-goes17.s3.amazonaws.com/ABI-L1b-RadC/2019/001/00/

Put that in a browser and you get exactly the message that ERDDAP™ displays, that is it has nothing to do with ERDDAP™. There is a section on the form of the URL required by ERDDAP, please play with those suggestions.

So the tl;dr version is you need to supply credentials and you need to test the URLs you give as described in the docs.

See if any of these things get you going and let me know. There are any number of people on this list who I know have ERDDAPs working with AWS and know more about it then I do, hopefully someone will chime in with what they use.

Thanks,

-Roy

PS - I am very please to see NCEI using ERDDAP™

PPS - Are you really using version 2.29? That has not been officially released, the latest official release is 2.28.1. and that is what you should be using for a non-test system. 2.29 will I believe have better handling of S3 (the code exists I dont; know which version it will appear in), Chis when he has time can give a little more detail.

Myles McManus - NOAA Affiliate

unread,
Nov 14, 2025, 9:28:35 PM (5 days ago) Nov 14
to Roy Mendelssohn - NOAA Federal, ERDDAP
Thanks for the assistance Roy. NCEI is using an older version of
ERDDAP in production. I just pulled a recent docker image for testing
s3 as we're trying to migrate all our systems to AWS including ERDDAP
and the datasets it serves.

As I'm re-reading the docs at
https://erddap.github.io/docs/server-admin/datasets?_highlight=aws#working-with-aws-s3-files

I can see that goes17 is an example used there, and I am just trying
to use the same example bucket. I do have AWS credentials setup with
in the docker container, and I was previously - before setting up the
credentials, getting errors associated with that.

I have tried a number of ways to set the <cacheFromUrl>, and you're
right that https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC/
returns an xml stating:

'''
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>ABI-L1b-RadC</Key>
<RequestId>5S94FQV0Q2ZZNBC1</RequestId>
<HostId>MxatNbBO3pZHv/0PxMBVLtQc58+AnIDp8+tLSDw01C2FjCCK3oYuTwhdIBNy+MrIrEoC8Da5bd0=</HostId>
</Error>
'''

However I have also tried
'<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/?prefix=ABI-L1b-RadC/</cacheFromUrl>'
as seen similarly in TestCase12 of my attached datasets.xml file, and
received the same error message:

'Reasons for failing to load datasets: No matching files at
https://noaa-goes17.s3.amazonaws.com/?prefix=ABI-L1b-RadC/'

and that page will return an xml containing many keys with values of
netCDF files.

All this to say, I think my request should be simplified to:

Can you please provide a <dataset> configuration that works for any
public bucket that I can use?

I very much appreciate you working on this and I hope that doing so
helps others too.

Thanks,
Myles McManus, P.E.
Data Scientist

Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email: myles,b,mcm...@noaa.gov | Voice/Text: (828) 419-1569

Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email: myles,b,mcm...@noaa.gov | Voice/Text: (828) 419-1569



On Fri, Nov 14, 2025 at 8:06 PM Roy Mendelssohn - NOAA Federal

Chris John - NOAA Affiliate

unread,
Nov 17, 2025, 2:42:59 PM (3 days ago) Nov 17
to Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), Roy Mendelssohn - NOAA Federal, ERDDAP
There are some improvements to the S3 integration thanks to @Champagne, Seth J CIV USN NRL DET SSC MS (USA) in 2.29.0 (currently alpha test version available on docker). Since the version 2.29 you have is a nightly/alpha you may want to check the exact version you are using. One thing to call out in the more recent builds of 2.29.0 is for completely public buckets, you should now be able to use anonymous credentials (which are used if you don't set any).

I'm currently working on improving some of our AWS S3 tests. If you just want to see a dataset using s3, I believe the dataset defined in EDDTestDataset.xmlFragment_testAwsS3 works (search for xmlFragment_testAwsS3 on that page). I will hopefully have another example soon using goes17.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi




Myles McManus - NOAA Affiliate

unread,
Nov 18, 2025, 1:19:58 PM (2 days ago) Nov 18
to Champagne, Seth J CIV USN NRL DET SSC MS (USA), ERDDAP, Chris John - NOAA Affiliate, Mendelssohn, Roy CIV (USA)
Hey all, First off thank you so much for the support, this is awesome feedback.

I have implemented Chris's suggested dataset from:
https://raw.githubusercontent.com/ERDDAP/erddap/refs/heads/main/src/test/java/testDataset/EDDTestDataset.java.


I have also implemented Seth's suggested:
<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/2019/045/00/</cacheFromUrl>


 The result is both datasets failing, but it's a different error than I was getting before, I just wasn't able to get past it though. The error in log.txt is:

n Datasets Failed To Load (in the last major LoadDatasets) = 2
    testAwsS3, noaa_goes17_abi_l1b_radc_table, (end)

Reasons for failing to load datasets:
testAwsS3: Cannot load from object array because "this.sourceAxisValues" is null
noaa_goes17_abi_l1b_radc_table: Cannot load from object array because "this.sourceAxisValues" is null

I have attached both the datasets.xml and the log.txt  and again really appreciate the assistance.

Thanks,
Myles McManus, P.E.
Data Scientist

Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)
NCEI Data Stewardship Division
151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles.b...@noaa.gov | Voice/Text:  (828) 419-1569

Thanks,

Myles McManus, P.E.

Data Scientist


Contractor - Team Alpha Omega for NCEI
NOAA's National Centers for Environmental Information (NCEI)

NCEI Data Stewardship Division

151 Patton Avenue, Asheville, NC 28801-5001 (E/NE5)
Email:  myles,b,mcm...@noaa.gov | Voice/Text:  (828) 419-1569



On Mon, Nov 17, 2025 at 2:37 PM Champagne, Seth J CIV USN NRL DET SSC MS (USA) <seth.j.cha...@us.navy.mil> wrote:

I was able to confirm locally that this worked in ERDDAP

<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/2019/045/00/</cacheFromUrl>

 

Thanks,

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

 

From: Champagne, Seth J CIV USN NRL DET SSC MS (USA)
Sent: Monday, November 17, 2025 2:27 PM
To: Chris John - NOAA Affiliate <chris...@noaa.gov>; McManus, Myles B CTR (USA) <myles.b...@noaa.gov>
Cc: Mendelssohn, Roy CIV (USA) <roy.men...@noaa.gov>
Subject: RE: [Non-DoD Source] Re: [ERDDAP] Adding dataset from public aws s3

 

I did not notice this thread was from the google groups, whoops.

 

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

 

From: Champagne, Seth J CIV USN NRL DET SSC MS (USA)
Sent: Monday, November 17, 2025 2:24 PM
To: 'Chris John - NOAA Affiliate' <chris...@noaa.gov>; McManus, Myles B CTR (USA) <myles.b...@noaa.gov>
Cc: Mendelssohn, Roy CIV (USA) <roy.men...@noaa.gov>; ERDDAP <erd...@googlegroups.com>
Subject: RE: [Non-DoD Source] Re: [ERDDAP] Adding dataset from public aws s3

 

Myles,

 

I think you can get this dataset working using ERDDAP with the prior release versions, you just have a mistake with your prefix configuration.

From what I can see from https://noaa-goes17.s3.us-east-1.amazonaws.com/

 

Your cacheFromUrl should be the following:

<cacheFromUrl>https://noaa-goes17.s3.us-east-1.amazonaws.com/ABI-L1b-RadC-Reproc/</cacheFromUrl>

 

I think that the code will try to recurse through the bucket and cache the data. This process can take a while for ERDDAP to complete. ERDDAP was designed to download the netcdf files locally for the dataset code to access, rather than read from the file over-the-wire. S3 has intentional cost/performance tradeoffs that Bob Simmons did his best to work around.

 

Try that change and see if you have better luck. If you send me your dataset xml snippet, I can try to troubleshoot for you.

 

Thanks,

__________________________

Seth Champagne

Center for Geospatial Sciences

Naval Research Laboratory Code, 7342

Stennis Space Center, MS 39529

Office - (228) 688-4792

seth.j.cha...@us.navy.mil

log.txt
datasets (1).xml

Roy Mendelssohn - NOAA Federal

unread,
Nov 18, 2025, 1:34:31 PM (2 days ago) Nov 18
to Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via, Chris John - NOAA Affiliate
Hi Myles:

If you are still using the Docker with version 2.29 can I suggest you use the Docker with 2.28.1, because as Chris said 2.29 is only at alpha level, This is just to make certain it is not a problem with 2.29.

I will try to take a look, hopefully Seth or Chris can chime in because they know about about AWS than I do,

-Roy
> <log.txt><datasets (1).xml>

Chris John - NOAA Affiliate

unread,
Nov 18, 2025, 2:27:50 PM (2 days ago) Nov 18
to Roy Mendelssohn - NOAA Federal, Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
I'm looking into the issue with sourceAxisValues. There may be a way to workaround it with some config changes, but that shouldn't be necessary. I'm hoping to have a proper fix soon.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Chris John - NOAA Affiliate

unread,
Nov 19, 2025, 10:30:58 AM (16 hours ago) Nov 19
to Roy Mendelssohn - NOAA Federal, Myles McManus - NOAA Affiliate, Champagne, Seth J CIV USN NRL DET SSC MS (USA), erDDAP Bob Simons via
I have a pull request that should resolve the sourceAxisValues issue. https://github.com/ERDDAP/erddap/pull/396

This also includes support for nc files with an axis variable (like time) that is constant per file (for example the goes17 data).

Once it is merged there will be a new alpha Docker build created automatically.

Christopher John (he/him)
NOAA Appointed Technical Director of ERDDAP™
Computer and Information Systems Manager, TSPi



Reply all
Reply to author
Forward
0 new messages