conceptual question

52 views
Skip to first unread message

Benjamin Bluhm

unread,
Mar 9, 2018, 8:20:31 AM3/9/18
to spar...@googlegroups.com
Hi,

I have been working on a project with spark-ts, but more recently I have been working on a time series project where I implemented my own spark distribution logic which is the following: I first create a partitioned RDD with time series IDs. I then map these partitions to the worker nodes where I import the time series data and perform model training in python. The advantage of this approach is that I can use the full range of Python machine learning libraries.

What is the disadvantage of this approach relative to using spark-ts?

Many thanks & Kind regards,
Benjamin

Von meinem iPhone gesendet
Reply all
Reply to author
Forward
0 new messages