Re: Can I train and deploy on different machine

123 views
Skip to first unread message

Pat Ferrel

unread,
Jun 13, 2016, 12:58:35 PM6/13/16
to jinyang zhou, predictionio-user, actionml-user
What template? The UR can be deployed on a machine that didn’t do the training but it will need to connect to the same EventServer where train got the data and the same Elasticsearch where the model was stored after training. Other templates will vary.

On Jun 12, 2016, at 12:49 AM, jinyang zhou <pandacho...@gmail.com> wrote:

Dear Sir,

Since my AWS machines doesn't have enough memory for training but I think it's good in deploying, so can I use my local server to build and train the model and move the model file to AWS for deploying?

If I can, which files should I move?

Thanks very much!

Pat Ferrel

unread,
Jun 15, 2016, 9:34:58 AM6/15/16
to Matan Goldman, predictionio-user, Alex Simes, actionml-user
Yes, thanks. A user just pointed out that Chrome 50 is broken. There is a work around we are implementing on the site side now—very sorry that Chrome is so crappy. Can’t say how often I have to fall back to Firefox—but I digress. 

On Jun 15, 2016, at 1:29 AM, Matan Goldman <goldma...@gmail.com> wrote:

Hi Pat,

There is definitely some problem with the docs site,

I am getting the same message on Chrome, works fine on Firefox 


On Monday, June 13, 2016 at 8:02:14 PM UTC+3, Pat Ferrel wrote:
No the link works and the github doc are being deprecated/moved. The live site is best but you can submit PRs for the new docs at https://github.com/actionml/docs.actionml.com


On Jun 13, 2016, at 6:08 AM, Alex Simes <alex.s...@gmail.com> wrote:

Looking at the link, I'm guessing they meant to direct to this guide:


google is your friend ;)

On Sunday, June 12, 2016 at 11:12:31 PM UTC+2, Gopal Patwa wrote:
above link give me this error

Oops, looks like there's no route on the client or the server for url: "http://actionml.com/docs/small_ha_cluster."\

On Sunday, June 12, 2016 at 1:09:27 AM UTC-7, Federico Reggiani wrote:
You can try with this http://actionml.com/docs/small_ha_cluster


jinyang zhou

unread,
Jun 16, 2016, 6:06:26 AM6/16/16
to predictionio-user, pandacho...@gmail.com, action...@googlegroups.com
Yep, I'm using UR. So what's the right way to connect the serving machine and computing machine?
Thanks very much.

Pat Ferrel

unread,
Jun 16, 2016, 11:03:24 AM6/16/16
to jinyang zhou, predictionio-user, action...@googlegroups.com
The Small HA Cluster setup is most similar to what you want. You have EventServers and PredictionServers (UR query server). These are setup like the small ha but do not really need to use Spark, any time you need to run something requiring Spark run it locally with `-- --master local[4]` to force it to run using only the localhost. These may be the same machine or a cluster of machines running all services but Spark.

Then you will create a Spark Driver machine, Spark Master, and Spark Executors. These are also setup using the small ha setup and, like the other machines all pointing to the ES/PS machine(s) for HBase, HDFS, and Elasticsearch. I would do this with at least 2 machines, one for the driver (actually `pio train`) and one for an executor, though these may need to be scaled horizontally for training speed.

The Spark driver and executor will need roughly the same amount of memory and it can grow large with large data. From the driver machine in the UR directory run the training with `pio train -- --master spark://<park-master-ip>:7077 --driver-memory 10g --executor-memory 10g` or whatever you need for memory. 

You can spin up the driver and executor machines, train, and terminate them after. They will create the model and store it in Elasticsearch and so will have no disruption on serving recommendations or ingesting events.

We are working on automated creation and tear-down of the Spark part so you can temporarily use very large instances with minimal cost. But that will come later in the Summer.


--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/2746619d-255a-41f9-8007-e47745e766ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marius Rabenarivo

unread,
Mar 28, 2017, 2:44:51 AM3/28/17
to actionml-user
Hello,

For the pio train command, I understand that I can use another machine with PIO, Spark Driver, Master and Worker.

But, is it possible to deploy in a machine without Spark locally installed as it is use spark-submit during deployment
and
org.apache.predictionio.workflow.CreateServer
references sparkContext.

I'm using UR v0.4.2 and PredictionIO 0.10.0

Regards,

Marius

Pat Ferrel

unread,
Mar 28, 2017, 11:26:33 AM3/28/17
to Marius Rabenarivo, actionml-user
better check versions because the UR v0.5.0 is the first to run with Apache PIIO v0.10.0


Marius Rabenarivo

unread,
Mar 28, 2017, 11:32:26 AM3/28/17
to actionml-user
I use the master branch from here : https://github.com/actionml/universal-recommender

And in the Version Changelog, I see that v0.4.2 is the latest version :)

Where can I find more accurate information about versions please?

Thank you

Marius

Pat Ferrel

unread,
Mar 29, 2017, 3:26:15 PM3/29/17
to actionml-user


Begin forwarded message:

From: Pat Ferrel <p...@occamsmachete.com>
Subject: Re: Can I train and deploy on different machine
Date: March 29, 2017 at 12:25:54 PM PDT

The Machine running the PredictionSever should not be configured to connect to the Spark Cluster.

This is why I explained that we use a machine for training that is a Spark cluster “driver” machine. The driver machine connects to the Spark cluster but the PredictionServer should not. 

The PredictionServer should have default config that does not know how to connect to the Spark cluster. In this case it will default to running spark-submit to launch with MASTER=local, which puts Spark in the same process with the PredictionServer and you will not get the cluster error. Note that the PredictionServer should be configured to know how to connect to Elasticsearch and HBase and optionally HDFS, only Spark needs to be local. Note also that no config in pio-env.sh needs to change, Spark local setup is done in the Spark conf, it has nothing to do with PIO setup.  

After running `pio build` and `pio train` copy the UR directory to *the same location* on the PredictionServer. Then, with Spark setup to be local, on the PredictionServer machine run `pio deploy` From then on if you do not change `engine.json` you will have newly trained models hot-swapped into all PredictionServers running the UR.


On Mar 29, 2017, at 11:57 AM, Marius Rabenarivo <mariusra...@gmail.com> wrote:

Let me be more explicit.

What I want to do is not using the host where PredictionServer will run as a slave in the Spark cluser.

When I do this I got "Initial job has not accepted any resources" error message.

2017-03-29 22:18 GMT+04:00 Pat Ferrel <p...@occamsmachete.com>:
yes

My answer below was needlessly verbose.


On Mar 28, 2017, at 8:41 AM, Marius Rabenarivo <mariusra...@gmail.com> wrote:

But I want to run the driver outside the server where I'll run the PredictionServer.

As Spark will be used only for launching there.

Is it possible to run the driver outside the host where I'll deploy the engine? I mean for deploying

I'm reading documentation about Spark right now for having insight on how I can do it but I want to know if someone has tried to do something similar.

2017-03-28 19:34 GMT+04:00 Pat Ferrel <p...@occamsmachete.com>:
Spark must be installed locally (so spark-submit will work) but Spark is only used to launch the PredictionServer. No job is run on Spark for the UR during query serving.

We typically train on a Spark driver machine that is like part of the Spark cluster and deploy on a server separate from the Spark cluster. This is so that the cluster can be stopped when not training and no AWS charges are incurred. 

So yes you can and often there are good reasons to do so.

See the Spark overview here: http://actionml.com/docs/intro_to_spark


On Mar 27, 2017, at 11:48 PM, Marius Rabenarivo <mariusra...@gmail.com> wrote:

Hello,

For the pio train command, I understand that I can use another machine with PIO, Spark Driver, Master and Worker.

But, is it possible to deploy in a machine without Spark locally installed as it is use spark-submit during deployment
and 
org.apache.predictionio.workflow.CreateServer
references sparkContext.

I'm using UR v0.4.2 and PredictionIO 0.10.0

Regards,

Marius

P.S. I also posted in the ActionML Google group forum : https://groups.google.com/forum/#!topic/actionml-user/9yNQgVIODvI






Pat Ferrel

unread,
Mar 30, 2017, 1:58:36 PM3/30/17
to us...@predictionio.incubator.apache.org, actionml-user
To run locally in the same process as pio delete those files and do not launch Spark as a daemon, only use PIO commands.

We do not “re-deploy” we hot-swap the model that predictions are made from so the existing deployment works with the new data automatically and without any down-time.

Re-deploying means stopping the deployed process and restarting it. This is never necessary with the UR unless engine.json config is changed.


On Mar 30, 2017, at 12:47 AM, Bruno LEBON <b.l...@redfakir.fr> wrote:

"Spark local setup is done in the Spark conf, it has nothing to do with PIO setup.  "

Hi Pat,

So when you say the above, which files do you refer to? the "masters" and "slaves" files ? So I should put localhost in those files instead of the dns names I configured in /etc/hosts?
Once this is done, I'll be able to launch 
"nohup pio deploy --ip 0.0.0.0 --port 8001 --event-server-port 7070 --feedback --accesskey 4o4Te0AzGMYsc1m0nCgaGckl0vLHfQfYIALPleFKDXoQxKpUji2RF3LlpDc7rsVd -- --driver-memory 1G > /dev/null 2>&1 &"
with my Spark cluster off ?

Also, I have the feeling that once the train is done, the new model is automatically deployed, is that so? In the template Ecommerce recommendation ,the log was explicitly telling that the model was being deployed, whereas in Universal Recommender the log doesnt mention an eventual automatic deploy right after the train is done.

 

Marius Rabenarivo

unread,
Mar 30, 2017, 2:14:29 PM3/30/17
to us...@predictionio.incubator.apache.org, actionml-user
For the host where we run the training, do we have to put the path to ES_CONF_DIR and HADOOP_CONF_DIR in pio-env.sh even if we use remote ES and Hadoop clusters?

2017-03-30 22:09 GMT+04:00 Marius Rabenarivo <mariusra...@gmail.com>:
Replace Haddop by Hadoop in the previous mail

2017-03-30 22:08 GMT+04:00 Marius Rabenarivo <mariusra...@gmail.com>:
For the host where we run the training, do we have to put the path to ES_CONF_DIR and HADOOP_CONF_DIR in pio-env.sh even if we use remote ES and Haddop clulsters?

--
You received this message because you are subscribed to a topic in the Google Groups "actionml-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/actionml-user/9yNQgVIODvI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to actionml-user+unsubscribe@googlegroups.com.

To post to this group, send email to action...@googlegroups.com.

Pat Ferrel

unread,
Mar 30, 2017, 2:52:16 PM3/30/17
to Marius Rabenarivo, us...@predictionio.incubator.apache.org, actionml-user
In the thread below I answered this. 

" Note that the PredictionServer should be configured to know how to connect to Elasticsearch and HBase and optionally HDFS, only Spark needs to be local. Note also that no config in pio-env.sh needs to change, Spark local setup is done in the Spark conf, it has nothing to do with PIO setup.””



You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages