Arima Implementation for Pyspark

204 views
Skip to first unread message

ASAD SALEEM

unread,
Feb 3, 2020, 12:28:44 AM2/3/20
to spar...@googlegroups.com
Hello Sir, 

I am trying to implement Arima in Pyspark and you are the only source for it, for me.

This is my first time using spark or pyspark, and it has been really hard for me to understand each function. So far I was able to clean my data and prepare the train_df and test_df in spark.

Its a simple univariate series with timestamp time and AQI numerical value. Aim is to predict AQI (Air Quality Index) value.

What I am unable to do is use your library/implementation to train/fit/predict the ARIMA model, or any model.

I have read all your documents but since this is my first time, I feel really stuck and amateur at trying to implement them in my jupyter notebook.

A small implementation would be really appreciated, given the following:

df = [Time: timestamp format, AQI: integer format]
train_df = 80% of the df
test_df = 20% of the df

(What do I import, and what do I implement for arima) 

ps: I have done this project in python and I have the parameters required for ARIMA (p,q,r), but for pyspark; how to implement and what to import, and all that I am stuck.

Been at it for 2 days now and the only source is your library and I feel too amateur to implement it.

Any help would be appreciated
Reply all
Reply to author
Forward
0 new messages