datasetID naming format

68 views
Skip to first unread message

Chuan-Yuan Hsu

unread,
Aug 13, 2021, 12:23:40 PM8/13/21
to ERDDAP
Hi, 

I was getting trouble loading one of the netCDF datasets in my ERDDAP (v2.02). After a couple of days of testing, I figured out this error possibly coming from the naming structure of datasetID.  I have tried datasetID=wmo-42907 but not working; however, if it is datasetID=wmo_42907, the data ingestion is working properly (testing on v2.02, v2.11, and v2.14). 

Interestingly, in my ERDDAP, most of the datasetIDs I provide are in the naming similar to "wmo-42907" (see attached figure).  Any thoughts on this? Bugs?  

Screen Shot 2021-08-13 at 11.23.16 AM.png

Bob Simons

unread,
Aug 13, 2021, 1:28:44 PM8/13/21
to ERDDAP
Why are you testing on 3 versions of ERDDAP?! Just use the latest version. Always.
There are security reasons why everyone should always use the latest version. 
Plus, I have no interest in trying to solve problems related to older versions, when the underlying bug (if any) may have been fixed in a newer version.
Just use the latest version. Always.

In general, if something isn't working, look for the error message (usually in log.txt). It will tell you what the problem is and you won't have to spend a couple of days trying to figure it out. If you can't figure out the error message, then include it in the message you post to the Google Group so others can try to explain the error message.

Yes. The problem probably is the use of a hyphen in the datasetID.
The documentation says: 
"The datasetIDs should be a letter (A-Z, a-z) followed by any number of A-Z, a-z, 0-9, and _ (but best if <32 characters total)."
so don't use a hyphen in the datasetID.
(Different versions of ERDDAP may have allowed hyphens, but it's a bad idea so don't do it.)
I have no explanation for why other datasets seem to work (so you say) with a hyphen. Just don't use hyphens and these problems will go away.


Chuan-Yuan Hsu

unread,
Aug 16, 2021, 12:10:25 PM8/16/21
to Bob Simons, ERDDAP
Thanks for the information. 
After renaming the datasetID, in my testbed, I am able to ingest the file into ERDDAP. However, if I start to load all of the files, the trouble datasetID “wmo-42907” (now is wmo_42907) is unable to load again. The error message from DasDds.sh is 


*** An error occurred while trying to load wmo_42907:

*** An error occurred while trying to load wmo_42907:
java.lang.RuntimeException: datasets.xml error on or before line #177612: Your query produced too much data.  Try to request less data. [memory]  The request needs more memory (70612 MB) than is ever safely available in this Java setup (32268 MB). (TableWriterAll.cumulativeTable)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:471)
 at gov.noaa.pfel.erddap.dataset.EDD.oneFromDatasetsXml(EDD.java:551)
 at gov.noaa.pfel.erddap.dataset.EDD.testDasDds(EDD.java:11133)
 at gov.noaa.pfel.erddap.DasDds.doIt(DasDds.java:131)
 at gov.noaa.pfel.erddap.DasDds.main(DasDds.java:157)
Caused by: java.lang.RuntimeException: Your query produced too much data.  Try to request less data. [memory]  The request needs more memory (70612 MB) than is ever safely available in this Java setup (32268 MB). (TableWriterAll.cumulativeTable)
 at com.cohort.util.Math2.ensureMemoryAvailable(Math2.java:477)
 at gov.noaa.pfel.erddap.dataset.TableWriterAll.cumulativeTable(TableWriterAll.java:291)
 at gov.noaa.pfel.erddap.dataset.TableWriterDistinct.finish(TableWriterDistinct.java:106)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.getDataForDapQuery(EDDTableFromFiles.java:4013)
 at gov.noaa.pfel.erddap.dataset.EDDTable.subsetVariablesDataTable(EDDTable.java:12453)
 at gov.noaa.pfel.erddap.dataset.EDDTable.distinctSubsetVariablesDataTable(EDDTable.java:12519)
 at gov.noaa.pfel.erddap.dataset.EDDTable.ensureValid(EDDTable.java:730)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:1919)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromNcFiles.<init>(EDDTableFromNcFiles.java:134)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:503)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:457)
 ... 4 more



And If I dig into log.txt, it shows, 



getting metadata from /data/erddap/ntl/all-in-one/42907/gcoos_ioos-station-wmo-42907_2021_08.nc
  ftLastMod first=2020-11-11T22:23:40Z last=2021-08-15T19:06:24Z
  accessibleViaNcCF=[true]
{{{{#84 2021-08-15T20:50:09+00:00 (notLoggedIn) 114.119.132.202 GET /erddap/categorize/keywords/pulley/index.htmlTable?page=1&itemsPerPage=1000
Resource not found: unknown categoryName=pulley
*** lowSendError for request #84: isCommitted=false fullMessage=
Error {
    code=404;
    message="Not Found: (no details)";
}

}}}}#84 114.119.132.202 SUCCESS. TIME=1001ms

datasets.xml error on line #177612
While trying to load datasetID=wmo_42907 (after 104492 ms)
java.lang.RuntimeException: datasets.xml error on or before line #177612: Your query produced too much data.  Try to request less data. [memory]  The request needs more memory (70612 MB) than is ever safely available in this Java setup (50529 MB). (TableWriterAll.cumulativeTable)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:471)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:359)
Caused by: java.lang.RuntimeException: Your query produced too much data.  Try to request less data. [memory]  The request needs more memory (70612 MB) than is ever safely available in this Java setup (50529 MB). (TableWriterAll.cumulativeTable)
 at com.cohort.util.Math2.ensureMemoryAvailable(Math2.java:477)
 at gov.noaa.pfel.erddap.dataset.TableWriterAll.cumulativeTable(TableWriterAll.java:291)
 at gov.noaa.pfel.erddap.dataset.TableWriterDistinct.finish(TableWriterDistinct.java:106)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.getDataForDapQuery(EDDTableFromFiles.java:4013)
 at gov.noaa.pfel.erddap.dataset.EDDTable.subsetVariablesDataTable(EDDTable.java:12453)
 at gov.noaa.pfel.erddap.dataset.EDDTable.distinctSubsetVariablesDataTable(EDDTable.java:12519)
 at gov.noaa.pfel.erddap.dataset.EDDTable.ensureValid(EDDTable.java:730)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:1919)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromNcFiles.<init>(EDDTableFromNcFiles.java:134)

The files (XML files, log.txt, netcdf files) can be accessed on https://erddap.gcoos.org/temporary/error_messages_erddap/ 

Thanks,
Chuan-Yuan Hsu, Ph.D.
+—————————————————————+
Postdoctoral Research Scientist,
Product Developer, Data Scientist,
Gulf of Mexico Coastal Ocean Observing System (GCOOS)
Texas A&M University — Department of Oceanography

Office: 979-845-3956
Mobile’: 734-926-5394
Email: ch...@tamu.edu
Web: http://gcoos.org

-- 
You received this message because you are subscribed to a topic in the Google Groups "ERDDAP" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/erddap/h0sPoEzGZaY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to erddap+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/d35a96cc-fcc4-4ba9-85f4-4867d4c27614n%40googlegroups.com.

Bob Simons

unread,
Aug 16, 2021, 7:02:33 PM8/16/21
to ERDDAP
Looking at the stack trace, I see that the "too much data" error while creating the subsetVariables table. The most likely problem is that there are variables listed in <subsetVariables> that shouldn't be. It should just have the "outer"/platform/station/trajectory  variables, not the observation variables. GenerateDatasetsXml just makes an educated guess. A human really needs to think about this and set it correctly. 

Chuan-Yuan Hsu

unread,
Aug 17, 2021, 3:18:37 PM8/17/21
to Bob Simons, ERDDAP
Thanks Bob, this does solve my problem.

Best,

Chuan-Yuan Hsu, Ph.D.
+—————————————————————+
Postdoctoral Research Scientist,
Product Developer, Data Scientist,
Gulf of Mexico Coastal Ocean Observing System (GCOOS)
Texas A&M University — Department of Oceanography

Office: 979-845-3956
Mobile’: 734-926-5394
Email: ch...@tamu.edu
Web: http://gcoos.org
Reply all
Reply to author
Forward
0 new messages