This may be the same issue that I run into too. When I dig into the resource manager of the EMR (using aws emrs) it appears like a crash of the spark server. I typically run into this problem when I try to iterate over more 1k individual time series lists to perform individual autofits and forecasts.
Maybe it is a resource issue and something is not getting released or cleaned up which is crashing the jvm?
Is there anyone that tried to perform individual forecasting with a map function? I wonder if it is my code that is bad and there is a better way to iterate over large sets of time series lists.
Here is the pseudo of the way I am doing it:
def createForecast(fKey:String, valuesList:List[Double]) {
val ts = Vectors.dense( valuesList.toArray )
var arimaModel = ARIMA.autoFit(ts, 5, 3, 5)
val forecast = arimaModel.forecast(ts, futureSampleCount.toInt)
return forecast
}
val finalCollection = uniqueKeyList.map(fKey => createForecast( fKey, activityDf.filter($"forecastKey" === fKey).collectAsList.asScala.toList ) )
Thanks.