ARIMA Forecast using TimeseriesRDD

305 views
Skip to first unread message

Geoffray Bories

unread,
Aug 10, 2017, 8:57:43 AM8/10/17
to spar...@googlegroups.com, Robert Suhada, Simon Ondracek

Hello Spark-ts support team,  

 

I’m working with IoT sensor data and I’m trying to use the spark-ts package to do distributed ARIMA forecast.

I was able to create an Observation dataframe and convert it to a TimeseriesRDD using the functions provided by the API for DatetimeIndex and creating the tsRDD.

 

However my issue is that when I’m trying to use the ARIMA forecast in some map function, it’s expecting the Vector component as a parameter. I wasn’t able to find how to extract it from the TimeseriesRDD structure, and I get a compile error if I give it the whole TimeseriesRDD variable.

 

I saw there is TimeseriesRDD.keys to get the array of all different keys but I wasn’t able to find its Vector counterpart.

Is there some specific way of getting it ? ( I might have missed it in the API in which case I’m asking for forgiveness )

 

Also, once you get your forecast, how do you integrate it with the initial timeseriesRDD variable ?

do you need to create another TimeseriesRDD ?

 

Many thanks in advance for your feedback

 

Best regards,

Geoffray BORIES

Data Scientist

 

FOXCONN 4TECH (Foxconn CZ s.r.o.)

K Žižkovu 813/2, 190 00 Praha 9, Czech Republic

Mobile (CZ): +420 733 781 523 (INT VPN 527-22 523)

geoffra...@foxconn4tech.com

logo3

 

Eric Patterson

unread,
Aug 11, 2017, 8:33:50 AM8/11/17
to Time Series for Spark (the spark-ts package), robert...@foxconn4tech.com, simon.o...@foxconn4tech.com, geoffra...@foxconn4tech.com
Hey Geoffray,

I read your post pretty fast but once you have your array then you just instantiate that into a dense vector.  Example:
val ts = Vectors.dense( timeSampleDf.toArray)

As for your next question on aligning the forecasts back up; what I did was add row indexes (through the zipWithIndex function) on the both the initial and then outputted forecast.  Then a quick join on those row indexes will give me an output with my initial and forecast side by side.  Keep in mind that the forecast output will be longer (since it gives you back all the forecast values from your initial plus what ever extra future points you passed in during the forecast call).

Eric



geoffra...@gmail.com

unread,
Aug 16, 2017, 7:12:33 AM8/16/17
to Time Series for Spark (the spark-ts package), robert...@foxconn4tech.com, simon.o...@foxconn4tech.com, geoffra...@foxconn4tech.com
Hello Eric,

Thanks for those details, I actually found out another way of doing it, it turns out I can use the mapSeries function and make the forecast directly, if I give the specific DateIndex that I created for the forecast horizon.
Sharing is caring :) , here is how I proceeded :

val modelsrdd = Tsrdd.mapSeries(v => ARIMA.fitModel(1, 0, 1, v).forecast(v, horizon.toInt), dtIndexForcast) 

of course you need to define horizon and dtIndexForcast accordingly

Thanks for the help anyway
Best Regards

Eric Patterson

unread,
Aug 16, 2017, 7:17:47 AM8/16/17
to Time Series for Spark (the spark-ts package), geoffra...@gmail.com
  
Thank you Geoffray!

That does look very fancy and simple.  When I need to update my work then I will surely give that map series function a try.

Thanks,
Eric

Reply all
Reply to author
Forward
0 new messages