ERDDAP memory usage slowly growing over time?

156 views
Skip to first unread message

Benjamin Adams

unread,
Apr 23, 2024, 12:05:07 PM4/23/24
to ERDDAP
Hi, has anyone seen ERDDAP memory usage slowly growing over time?  We've been collecting metrics on an ERDDAP process's memory usage on the Glider DAC server and have seen memory usage slowly growing over time.  We have been updating the Glider DAC so have restarted the ERDDAP process 3 days ago.  By about next week I should have a line graph with a better trend line to demonstrate what I believe is happening, namely that memory allocations are growing over time.  We're currently running ERDDAP 2.23 with max heap size of 10 GiB, and 16 vCPUs and are mostly running aggregations of 100+ NetCDF files.  I would approximate we have around a terabyte of data files or slightly less.  We're not seeing conditions like excessive load averages on the server which could lead to memory pressure and suspect there might be a memory leak.  Has anyone experienced similar issues with ERDDAP?

Roy Mendelssohn - NOAA Federal

unread,
Apr 23, 2024, 12:30:14 PM4/23/24
to Benjamin Adams, ERDDAP, Chris John - NOAA Affiliate
Yes we are seeing the same thing and are investigating the cause - one thing is we only see it in ERDDAPs that get heavy use, our other ERDDAP have no problems. Scans that are either rmultithreading or coming from multiple IPs at once, particularly if they make a lot of large or very large request repeatedly, tend to exhibit the same behavior. My working hunch is that something is not getting garbage collected properly, such as perhaps if there is not enough memory for young generation formation those don't get cleared, but don't quote me on that. I have been playing with Tomcat settings also to see if that helps. I would be curious what happens if you increase the max heap size by fair amount, if you are willing to try. Right now I think I have our main one at 15GB.

tl,dr - yes we are aware of it, no we don't see it on all ERDDAPs, no we don't have a fix at the moment.

-Roy

> On Apr 23, 2024, at 9:05 AM, 'Benjamin Adams' via ERDDAP <erd...@googlegroups.com> wrote:
>
> Hi, has anyone seen ERDDAP memory usage slowly growing over time? We've been collecting metrics on an ERDDAP process's memory usage on the Glider DAC server and have seen memory usage slowly growing over time. We have been updating the Glider DAC so have restarted the ERDDAP process 3 days ago. By about next week I should have a line graph with a better trend line to demonstrate what I believe is happening, namely that memory allocations are growing over time. We're currently running ERDDAP 2.23 with max heap size of 10 GiB, and 16 vCPUs and are mostly running aggregations of 100+ NetCDF files. I would approximate we have around a terabyte of data files or slightly less. We're not seeing conditions like excessive load averages on the server which could lead to memory pressure and suspect there might be a memory leak. Has anyone experienced similar issues with ERDDAP?
>
> --
> You received this message because you are subscribed to the Google Groups "ERDDAP" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to erddap+un...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/erddap/5e243409-0111-4860-bf0d-63d13154348fn%40googlegroups.com.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Men...@noaa.gov www: https://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

Message has been deleted
Message has been deleted

Joeseph Smith

unread,
Jun 5, 2025, 2:39:48 PMJun 5
to ERDDAP
Hey folks,

Going to revive this thread and see if there are any updates?

Trying to diagnose my related ERDDAP woes for GLOS here, and decided to bring it local with just the impacted datasets. They're a set of small NetCDF files on an AWS S3 bucket with three days worth of buoy data per platform represented (not large data necessarily). Representative dataset XML attribute header is the following:

<dataset type="EDDTableFromMultidimNcFiles" datasetID="obs_304_latest" active="true">
<nThreads>1</nThreads>
<reloadEveryNMinutes>20</reloadEveryNMinutes>
<updateEveryNMillis>-1</updateEveryNMillis>
<fileDir>/local/directory/I/deposit/to/</fileDir>
<fileNameRegex>.*obs_304_latest\.nc</fileNameRegex>
<recursive>true</recursive>
<pathRegex>.*</pathRegex>
<metadataFrom>last</metadataFrom>
<standardizeWhat>0</standardizeWhat>
<removeMVRows>true</removeMVRows>
<sortFilesBySourceNames />
<fileTableInMemory>false</fileTableInMemory>
<cacheSizeGB>-1</cacheSizeGB>

So I set up Tomcat locally (Linux Mint 22.1, 32 GB Memory, 12th Gen Intel i5), setup ERDDAP, the datasets, get them working and downloading, and can get HTML Table output. Great.

I set up a script to poke every 30 seconds all of the potential datasets - so some exist, some don't yet as it may be still early in the season here in the Great Lakes yet - just to ensure the memory issue isn't tied to demand. Memory consumption remains stable through those pokes. However, probably in line with the `reloadEveryNMinutes` setting, memory consumption goes up 1-2 GB or more every 20-30 minutes. Having ran this for a few hours now, memory occupied surpasses what I thought I specified - ~12 G instead of 8 G. Incidentally the Kubernetes pod running the live GLOS Seagull ERDDAP server restarts about every 3-5 hours after an overnight scheduled restart.

So not knowing quite well where to look immediately in configurations, as everything initially goes without problem, I turn to a profiler, VisualVM, and poke around with that. Couple things look interesting (screenshots attached): heap consumption patterns, and A LOT and growing number of AWSEventLoop threads.

That's where I'm at. Thought I would deposit some observations and see what catches among everyone else here.

Thanks so much and take care

-Joe
erddap_profiler_20250605.png
erddap_profiler_3_20250605.png

Joeseph Smith

unread,
Jun 5, 2025, 2:40:38 PMJun 5
to ERDDAP
Forgot: this is running ERDDAP 2.26. Apologies for the multiple emails

Roy Mendelssohn - NOAA Federal

unread,
Jun 5, 2025, 3:22:27 PMJun 5
to Joeseph Smith, ERDDAP
Hi Joe:

Ok this just makes me more confused. If I understand what you have said, you have moved the datasets to be local, and are now testing locally. However, the threads you show say you are running or accessing something on AWS (our main ERDDAP which runs all sort software stuff does not show those threads, but we do not use AWS).

So, stepping back, can you give a more complete description of your setup.

Second, as I have stated before, Java now uses a lot more memory than the heap. Just looking at the heap usage will not necessarily tell you a lot. For example, on our main ERDDAP, heap is at 20GB, but Java memory usage is at 45GB. So you have to track memory usage., not just heap usage, and memory usage can be much larger than heap size.

Also, you need to be checking if swap space is being used, If ERDDAP starts swapping, it will eventually grind to a halt, And you need to check if you have the time to do a major reload at a large enough value to be certain it is finishing before it starts again. This is set in setup.xml.

Finally our experience is that memory usage increases for awhile as usage increases until it plateaus, so for example the ERDDAP mentioned above pretty much now stays around 45GB, but right after startup it is at 26GB, and takes a while to grow.

Also what I use to track things are the commands ’sar’ and ‘pidstat’, you can install these by installing ’systat’, though top, htop or top should do the trick.

What I would suggest is to set up a purely local ERDDAP with no AWS involved, and just a couple of datasets to start so that you can feel pretty certain that you can both stay within your memory limits and that the reloads finish, and track memory usage. If that seems to e working, add some more datasets. Also I believe EDDTableFromMultidimNcFiles require a good bit more work than other datatypes. So as you test this local only version, look at the ERDDAP logs to see if there are any issues in loading the datasets.

So the short version is first remove anything to do with AWS, in case that is where the problem is arising. (It may well be), and track you java memory usage, not just heap usage, track swap usage, and start with a much smaller number of datasets and see if you see the same behavior.

HTH,

-Roy
> To view this discussion, visit https://groups.google.com/d/msgid/erddap/e431d65c-2fde-4612-b81e-c1c7f1c66defn%40googlegroups.com.


Joeseph Smith

unread,
Jun 5, 2025, 3:38:26 PMJun 5
to Roy Mendelssohn - NOAA Federal, ERDDAP
Thanks Roy, and apologies for the confusion.

The ERDDAP 2.26 setup is local. The NetCDF data are still in AWS. They get downloaded to the local machine I have running ERDDAP per the usual cacheFromURL process.

Additionally: running java-21-openjdk, and Tomcat 10.1.41

Good note on the overall memory situation. I'll mull that over. So your whole Java situation is using 65 GB Memory with a 20 GB specifically in the ERDDAP heap if I'm reading right. And that's leaving out the rest of the OS.

I'll work on a purely local setup for this experiment.

Appreciate the tips!

-Joe

Joeseph Smith

Cyberinfrastructure Engineer

734-972-8096

www.glos.org

LinkedIn | Facebook | Instagram | YouTube


Roy Mendelssohn - NOAA Federal

unread,
Jun 5, 2025, 4:25:42 PMJun 5
to Joeseph Smith, ERDDAP, Chris John - NOAA Affiliate
Our machine is 64GB. Our heap is set to 20GB. Actual Java memory from pidstat is 45GB, though visualvm does not show us going over heap. Now that I know you are running by getting data from AWS, that could be a source of the problem. It would help us if we can isolate that, ie get a purely local implementation and see if that exhibits the same problems. If it works ok locally then try the same with AWS. If you need, send me some files and I can test a purely local implementation, also to save me some time, send the relevant datasets.xml snippets for those datasets.

Can you tell me one more thing. Is the storage on AWS block storage or object storage. I do know that the object storage code is being re-worked to be both more robust and also not be limited to AWS. But for now, let’s see if the problem persists when there is no AWS involved and when we know there is both plenty of memory and plenty of time for the dataset load to complete.

-Roy

PS - What is your setting for <useSharedWatchService>


> On Jun 5, 2025, at 12:38 PM, Joeseph Smith <j...@glos.org> wrote:
>
> Thanks Roy, and apologies for the confusion.
>
> The ERDDAP 2.26 setup is local. The NetCDF data are still in AWS. They get downloaded to the local machine I have running ERDDAP per the usual cacheFromURL process.
>
> Additionally: running java-21-openjdk, and Tomcat 10.1.41
>
> Good note on the overall memory situation. I'll mull that over. So your whole Java situation is using 65 GB Memory with a 20 GB specifically in the ERDDAP heap if I'm reading right. And that's leaving out the rest of the OS.
>
> I'll work on a purely local setup for this experiment.
>
> Appreciate the tips!
>
> -Joe
>
> Joeseph Smith
> Cyberinfrastructure Engineer
> 734-972-8096
> www.glos.orgLinkedIn | Facebook | Instagram | YouTube

Joeseph Smith

unread,
Jun 5, 2025, 10:49:40 PMJun 5
to ERDDAP
To close this out for the evening: ran it a couple hours without the AWS S3 connection. Memory consumption stayed below 2 GB (~1.7) compared to getting up to about 8 by that time with the connection. Threads stable. I can guess that if that were the scheme I ran, then ERDDAP wouldn't restart every 3-5 hours. 

Thanks all, and have a good evening.

erddap_profiler_no_awss3_3_20250605.png
erddap_profiler_no_awss3_4_20250605.png

Joeseph Smith

unread,
Jun 5, 2025, 10:51:15 PMJun 5
to ERDDAP
Oh, forgot your question, Roy: AWS S3 is object storage

Roy Mendelssohn - NOAA Federal

unread,
Jun 5, 2025, 11:01:51 PMJun 5
to Joeseph Smith, erDDAP Bob Simons via
HI Joe:

Ok that is helpful. If nothing changes by tomorrow, can you try running accessing the AWS object store, but only for a small subset of the data there, and I mean small. I want to see if the memory uses roughly stabilizes at a value. it is a long story but i am somewhat familiar with your setup and if memory serves you have a lot of the multidimensional netcdf datasets. It is possible that those type of datasets require a lot of memory, or it is possible that there is a problem somewhere with the AWS code, But that these are the things I am trying to isolate.

Thanks,

-Roy

PS - how many of the multidimensional datasets are in the configuration you mention below
> To view this discussion, visit https://groups.google.com/d/msgid/erddap/902283bc-fb81-4e33-9365-7d9e1543b9dan%40googlegroups.com.


Joeseph Smith

unread,
Jun 5, 2025, 11:56:03 PMJun 5
to Roy Mendelssohn - NOAA Federal, erDDAP Bob Simons via
I’ll try the 1 or 2 file scheme tomorrow for sure.

All, currently 53, of the datasets we’re looking at here are under the EDDTableFromMultiDimNcFiles class

-Joe


Joeseph Smith

Cyberinfrastructure Engineer

734-972-8096

www.glos.org

Joeseph Smith

unread,
Jun 9, 2025, 10:52:10 AMJun 9
to ERDDAP
Hey Folks,

Just to conclude this thread (or my portion of it): Ran local ERDDAP with just 3 of the datasets with connections to S3 all of Friday Evening. Memory usage topped off at 2GB. So you may conclude that with just a few datasets, this is OK. AWS Thread count was still climbing, however.

End of day: I ended up re-routing NetCDF to the AWS Elastic File System we have attached to the ERDDAP, and axing the S3 bit. Over the weekend, I periodically checked in on things: no auto-restarts, and things seem to be updating OK.

Appreciate putting brains together here. Thanks all!

-Joe

Chris John - NOAA Affiliate

unread,
Jun 9, 2025, 11:18:48 AMJun 9
to Joeseph Smith, ERDDAP
This does not sound like the desired behavior. Based on this information I've created an issue on GitHub. https://github.com/ERDDAP/erddap/issues/302

Reply all
Reply to author
Forward
0 new messages