Spark-ts ARIMA getting into deadlock

143 views
Skip to first unread message

itiss...@gmail.com

unread,
Oct 30, 2016, 7:50:02 AM10/30/16
to Time Series for Spark (the spark-ts package)
I am running the autofit function on a big dataset (few Million ARIMA models). It seems that some of them are running into a deadlock. Do you know how to resolve this problem? (at least, make it exit with a exception).

Best regards,
Amit

Sandy Ryza

unread,
Oct 31, 2016, 12:22:59 AM10/31/16
to itiss...@gmail.com, Time Series for Spark (the spark-ts package)
Hi Amit,

I haven't observed this problem.  Do you know the nature of what's taking the time?

-Sandy

--
You received this message because you are subscribed to the Google Groups "Time Series for Spark (the spark-ts package)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-ts+unsubscribe@googlegroups.com.
To post to this group, send email to spar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spark-ts/3ffea815-5a93-4195-bde3-ab3d28df3b7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pa...@ucdavis.edu

unread,
Oct 31, 2016, 6:31:59 AM10/31/16
to Time Series for Spark (the spark-ts package), itiss...@gmail.com
Hi Sandy,

Thanks for replying. I do not know the nature of problem but you can use this simple code to replicate the error. Please note that the vector I am using is pretty simple (not a good time series, but then in a big dataset, some of such cases do arise).

Best regards,
Amit

val a=Array(1,1.0,1,1)
//ts should be a mllib Vector. ML will not work (as of 0.4.0 version of spark-ts)
val ts=Vectors.dense(a)
//autofit the best arima model where p,q,r vary from 0 to 4 each
val arimaModel = ARIMA.autoFit(ts,4,4,4)
//Forecast two values using this model. We forecast next two values here.
val forecast = arimaModel.forecast(ts,2)

pa...@ucdavis.edu

unread,
Oct 31, 2016, 6:33:18 AM10/31/16
to Time Series for Spark (the spark-ts package), itiss...@gmail.com
Hi Sandy,

Thanks for replying. I do not know the nature of problem but you can use this simple code to replicate the error. Please note that the vector I am using is pretty simple (not a good time series, but then in a big dataset, some of such cases do arise).

Best regards,
Amit

val a=Array(1,1.0,1,1)
//ts should be a mllib Vector. ML will not work (as of 0.4.0 version of spark-ts)
val ts=Vectors.dense(a)
//autofit the best arima model where p,q,r vary from 0 to 4 each
val arimaModel = ARIMA.autoFit(ts,4,4,4)
//Forecast two values using this model. We forecast next two values here.
val forecast = arimaModel.forecast(ts,2)


On Sunday, October 30, 2016 at 11:22:59 PM UTC-5, Sandy Ryza wrote:
Hi Amit,

I haven't observed this problem.  Do you know the nature of what's taking the time?

-Sandy
On Sun, Oct 30, 2016 at 4:50 AM, <itiss...@gmail.com> wrote:
I am running the autofit function on a big dataset (few Million ARIMA models). It seems that some of them are running into a deadlock. Do you know how to resolve this problem? (at least, make it exit with a exception).

Best regards,
Amit

--
You received this message because you are subscribed to the Google Groups "Time Series for Spark (the spark-ts package)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-ts+u...@googlegroups.com.

pa...@ucdavis.edu

unread,
Nov 3, 2016, 2:19:37 PM11/3/16
to Time Series for Spark (the spark-ts package), itiss...@gmail.com, Sandy Ryza
Sandy

Any idea how to get over this issue?

Amit

On Sunday, October 30, 2016 at 11:22:59 PM UTC-5, Sandy Ryza wrote:
Hi Amit,

I haven't observed this problem.  Do you know the nature of what's taking the time?

-Sandy
On Sun, Oct 30, 2016 at 4:50 AM, <itiss...@gmail.com> wrote:
I am running the autofit function on a big dataset (few Million ARIMA models). It seems that some of them are running into a deadlock. Do you know how to resolve this problem? (at least, make it exit with a exception).

Best regards,
Amit

--
You received this message because you are subscribed to the Google Groups "Time Series for Spark (the spark-ts package)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-ts+u...@googlegroups.com.

Sandy Ryza

unread,
Nov 16, 2016, 7:51:28 PM11/16/16
to pa...@ucdavis.edu, Time Series for Spark (the spark-ts package), Amit Pande
Apologies, but I'm not aware off hand of what could be going on here.

-Sandy

kgie...@gmail.com

unread,
Dec 3, 2016, 8:01:19 PM12/3/16
to Time Series for Spark (the spark-ts package), pa...@ucdavis.edu, itiss...@gmail.com
All,

I am also experiencing the same issue.  Infinite loop in ARIMA.  For me, this is a critical bug.

Regards,
Karl

Eric Patterson

unread,
Aug 8, 2017, 8:49:15 AM8/8/17
to Time Series for Spark (the spark-ts package), pa...@ucdavis.edu, itiss...@gmail.com
This may be the same issue that I run into too.  When I dig into the resource manager of the EMR (using aws emrs) it appears like a crash of the spark server.  I typically run into this problem when I try to iterate over more 1k individual time series lists to perform individual autofits and forecasts.

Maybe it is a resource issue and something is not getting released or cleaned up which is crashing the jvm?

Is there anyone that tried to perform individual forecasting with a map function?  I wonder if it is my code that is bad and there is a better way to iterate over large sets of time series lists.

Here is the pseudo of the way I am doing it:

def createForecast(fKey:String, valuesList:List[Double]) {
     val ts = Vectors.dense( valuesList.toArray )
     var arimaModel = ARIMA.autoFit(ts, 5, 3, 5)
     val forecast = arimaModel.forecast(ts, futureSampleCount.toInt)
     return forecast
}

val finalCollection = uniqueKeyList.map(fKey => createForecast( fKey, activityDf.filter($"forecastKey" === fKey).collectAsList.asScala.toList ) ) 

Thanks.

Amit Pande

unread,
Aug 8, 2017, 10:14:22 AM8/8/17
to Eric Patterson, Time Series for Spark (the spark-ts package), amit
Eric,

It actually works now for me in a partial sense that it doesn't go to infinite pause. I use the following workaround:

var forecast = some_fallback_value
scala.util.Try{
        arimaModel=ARIMA.autoFit(ts,4,4,4)
       forecast=arimaModel.forecast(ts, futureSampleCount.toInt)
      }  


Amit

Eric Patterson

unread,
Aug 8, 2017, 10:19:37 AM8/8/17
to Time Series for Spark (the spark-ts package), eric.pa...@gfs.com, itiss...@gmail.com, pa...@ucdavis.edu
Thank you Amit!

That is interesting because I have a normal try{ } catch{ case e: Exception } around my block of autofit and forecast like you are suggesting.  This makes me think that is not the ts library dead locking on me but something else; I will try and wrap a bunch more blocks independently then to find out where.

Thanks.
Reply all
Reply to author
Forward
0 new messages