Datasets go missing.

20 views
Skip to first unread message

imcs...@gmail.com

unread,
Feb 12, 2025, 11:47:15 AMFeb 12
to ERDDAP
Hi there, 

A few datasets on out ERDDAP server seem to regularly vanish from the archive. Even if they all load normally. 

e.g. 

and


Both of these load normally. But after some period of time (I haven,t quite figured out how much yet), they vanish from the server and have the following error:


VIISTAI_2024_06_KM2408_LISST_PROFILES: datasets.xml error on line #4234
While trying to load datasetID=VIISTAI_2024_06_KM2408_LISST_PROFILES (after 8 ms)
java.lang.RuntimeException: datasets.xml error on or before line #4234:
ERROR in Test.ensureTrue:
No valid data files were found. See log.txt for details.
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:655)
 at gov.noaa.pfel.erddap.LoadDatasets.parseUsingSimpleXmlReader(LoadDatasets.java:617)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:219)
Caused by: java.lang.RuntimeException:
ERROR in Test.ensureTrue:
No valid data files were found. See log.txt for details.
 at com.cohort.util.Test.error(Test.java:23)
 at com.cohort.util.Test.ensureTrue(Test.java:54)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:2105)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromMultidimNcFiles.<init>(EDDTableFromMultidimNcFiles.java:101)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:666)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:629)
 ... 2 more



and 


OLIGO_DELBAY_MOOR_D1_ADCP_2024: datasets.xml error on line #3341
While trying to load datasetID=OLIGO_DELBAY_MOOR_D1_ADCP_2024 (after 54 ms)
java.lang.RuntimeException: datasets.xml error on or before line #3341:
ERROR in Test.ensureTrue:
No valid data files were found. See log.txt for details.
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:655)
 at gov.noaa.pfel.erddap.LoadDatasets.parseUsingSimpleXmlReader(LoadDatasets.java:617)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:219)
Caused by: java.lang.RuntimeException:
ERROR in Test.ensureTrue:
No valid data files were found. See log.txt for details.
 at com.cohort.util.Test.error(Test.java:23)
 at com.cohort.util.Test.ensureTrue(Test.java:54)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:2105)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromMultidimNcFiles.<init>(EDDTableFromMultidimNcFiles.java:101)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:666)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:629)
 ... 2 more

If I restart the server or reload the dataset using the hardFlag directory, the datasets are visible again. 

I'm trying to track down when the datasets fail, but I thought I'd ask here in case this is something others have seen. 

Eli


Tuomo Saari

unread,
Feb 12, 2025, 12:44:24 PMFeb 12
to imcs...@gmail.com, ERDDAP
Hi Eli,

I saw a similar issue after upgrading from erddap v2.23 to 2.25_1. 

It seems that the new version has stricter requirements for datasets (at least for the config, datasets.xml, possibly for the source files themselves too). I switched back to the older version just to make things work. I am planning to test things out on my development server before reinstalling the latest ERDDAP.

Some more background information: I contacted the ERDDAP developers about my issue, and some of their findings (that I haven't tested out yet) about my setup were:
  • The chunk size attribute (e.g. <att name="_ChunkSizes" type="uint">1024</att> ) for the variables was different when generated with the older ERDDAP vs. the newer ERDDAP, using the tool GenerateDatasetsXml.sh. So now I'm trying to think of the best / easiest way to regenerate the XML snippets for the datasets, because we have to do some manual edits to the output after that tool runs. 
  • More strict <fileNameRegex> tag format: I had .*.nc for some datasets, it needs to be .*\.nc (escaping the period character)
In addition, I had issues with some datasets whose source file was in a subfolder under the defined <fileDir> directory, even if the <recursive> tag was set to true. In the <fileNameRegex> tag I had not escaped the period, e.g. I had gom_ssh_2004.nc. I wonder if that now also needs to be escaped in the new ERDDAP version, because this is RegEx and not just a constant string.

Also, the error messages in the log file were not really helpful in my case. I wouldn't have guessed that the above things were the issue based on the log files. In some cases the error messages complained about the dataset dimensions. 

Again, I haven't had the time to test this out so I cannot say that I have a solution yet. 

Tuomo Saari | Research Specialist II
Gulf of Mexico Coastal Ocean Observing System (GCOOS)
Department of Oceanography | Texas A&M University
(979) 393-2480 | tuomo...@gcoos.org | tsa...@tamu.edu



On Wed, Feb 12, 2025 at 10:47 AM imcs...@gmail.com <imcs...@gmail.com> wrote:
Hi there,  A few datasets on out ERDDAP server seem to regularly vanish from the archive. Even if they all load normally.   e. g.   https: //erddap. marine. rutgers. edu/erddap/tabledap/VIISTAI_2024_06_KM2408_LISST_PROFILES. graph and https: //erddap. marine. rutgers. edu/erddap/tabledap/OLIGO_DELBAY_MOOR_D1_ADCP_2024. graph
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
 
ZjQcmQRYFpfptBannerEnd
--
You received this message because you are subscribed to the Google Groups "ERDDAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/erddap/fc6079e5-365f-41e9-a786-c453d2d087ecn%40googlegroups.com.

Roy Mendelssohn - NOAA Federal

unread,
Feb 12, 2025, 1:06:29 PMFeb 12
to imcs...@gmail.com, ERDDAP
Hi:

Yes we have seen this problem. (Assuming this is not an issue with first loading the dataset). The likely chain of events is that ERDDAP is doing a dataset update and for a variety of reasons now can not access the dataset (this is presuming it worked for a while), so throws that message. We have found two things cause this:

1. The volume disappears on the Linux system.

2. Access to the volume is temporarily very busy, so access for the dataset fails.

Next time it happens, can you check that you can actually access the dataset, check the settings ,setup.xml to see the times that have been set for when to stop a reload. ( search for the part that has like "<loadDatasetsMinMinutes>15</loadDatasetsMinMinutes> “ and read through e settings there). Then look at log.txt for when ERDDAP first started the update and follow through on other references to try and get some idea of how long it is taking. For example, if a user happens to be making heavy use of the dataset as the update starts, it may fail. One thing you can also do is adjust how often that datasets needs to be updated. Depending on how you add data to the dataset, you might increase the time, and set a flag when an actual datastet update occurs.

Just the kind of thing that needs tracking through a lot of logs, which is painful but other than the two reasons above we have not seen this.

HTH,

-Roy

imcs...@gmail.com

unread,
Feb 18, 2025, 8:52:57 AMFeb 18
to ERDDAP
Thanks Roy, 

I haven't seen the problem again. So it was probably, as you mentioned, a temporary problem with the file server.  But thanks for the tip on tracking it down. I'll keep an eye on it. 

Eli

Reply all
Reply to author
Forward
0 new messages