Requesting Example usage of the Spark-ts library

1,321 views
Skip to first unread message

Rohan Dattatray Kulkarni

unread,
Mar 7, 2016, 11:10:28 PM3/7/16
to spar...@googlegroups.com
Hello,

My name is Rohan Kulkarni and I am a Masters student in Computer Science at Columbia University in New York. I am interested in using the "spark-ts" library in python to perform time series analysis on stock data for a course project. I saw Mr. Sandy Ryza's talk during the NYC spark summit last month and am interested in knowing more about the library.

However the repository on github does not provide any example python code nor is the python documentation available to be able to start development.Could you please provide me with some example code illustrating simple time series prediction in python so that I can build upon that? Or any link to a comprehensive documentation on the same?

Hoping for a quick response!

Thanks and Regards,
Rohan Dattatraya Kulkarni
 

Sandy Ryza

unread,
Mar 9, 2016, 2:22:58 AM3/9/16
to rohan.k...@columbia.edu, spar...@googlegroups.com
Hi Rohan,

Glad to hear your interest in the library.  Regrettably, I haven't had the chance to write up a good set of Python examples, and the Python documentation is in poor shape as well.

If you're OK with digging in the source, the relevant bits are in these two files:

In general, I haven't had the chance to spend as much time on Python as Scala, so I'd only try to use it if you have an appetite to deal with possible bugs and missing functionality.

-Sandy

--
You received this message because you are subscribed to the Google Groups "Time Series for Spark (the spark-ts package)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-ts+u...@googlegroups.com.
To post to this group, send email to spar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spark-ts/CAKpFSpznTUFjOkfB%2BjrjEKEvRU%3DCZZmPdd%3DbY6XUHBSE7dvY3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

paul...@gmail.com

unread,
Mar 16, 2016, 11:15:11 AM3/16/16
to Time Series for Spark (the spark-ts package), rohan.k...@columbia.edu
Hey Rohan -

This week, I've been working on porting some of Sandy's Scala examples to Python.  The "Stocks" example is already up in the spark-ts-examples repository, and I should have examples for some of the model classes like ar, ARIMA, and EWMA completed soon.  I've also updated the README in that repo to describe how to run the examples for both Python and the JVM languages.  Please let me know if you run into any problems with either the examples or the docs.

--paul

Jay Tang

unread,
Aug 17, 2016, 3:31:06 PM8/17/16
to Time Series for Spark (the spark-ts package), rohan.k...@columbia.edu
Hey Paul, 

I also tried to run this Stocks.py example in the spark-ts_examples repository.  In my case,  to successfully run it, i need to create an egg file based on the setup.py in the spark-ts python library and pass it spark-submit using --py-files.  Is this something that can be added to the README, or is there a better way to do this?

Thanks, 
Jay

kc3...@columbia.edu

unread,
Apr 19, 2017, 11:26:31 AM4/19/17
to Time Series for Spark (the spark-ts package), rohan.k...@columbia.edu
Hey Jay,

Can you kindly guide on how to run this examples repository successfully?

I tried both stocks example as well as ARIMA test example. It seems that I didn't configure the package well.

Best,
Kelly

dzra...@gmail.com

unread,
Aug 29, 2017, 8:23:53 AM8/29/17
to Time Series for Spark (the spark-ts package), rohan.k...@columbia.edu
@Jay
can you please elaborate more in detail on what you did to run Stocks.py example.
I am getting following error upon doing spark-submit --driver-class-path sparkts-0.4.1-jar-with-dependencies.jar Stocks.py

Traceback (most recent call last):
  File "/home/doma.rd/Stocks.py", line 28, in <module>
    sc = SparkContext(appName="Stocks")
  File "/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
  File "/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
  File "/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
  File "/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
  File "/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoSuchMethodError: scala.runtime.VolatileByteRef.create(B)Lscala/runtime/VolatileByteRef;


On Wednesday, August 17, 2016 at 3:31:06 PM UTC-4, Jay Tang wrote:
Reply all
Reply to author
Forward
0 new messages