Running the experiment/dashboard server remotely

346 views
Skip to first unread message

jessica....@gmail.com

unread,
Oct 21, 2013, 8:06:34 PM10/21/13
to psi...@googlegroups.com
Hi,

I'm a little confused about the separability of the dashboard and server -- I know that running `psiturk` launches the dashboard, and then from the dashboard you can launch the experiment server. I've been setting up my experiment and testing it using the dashboard/server locally, but now that I'm actually ready to deploy it, I'm not exactly sure what the correct way to set it up is.

The server that I would like to run it on is remote, and has no GUI interface. When I run `psiturk`, it launches the dashboard server as I would expect, but also launches Links (a terminal browser, which is the only browser actually installed), which isn't terribly usable. If I run `psiturk -i <server> -p <port>`, then I can access the dashboard over the internet, but then it's publicly accessible and I don't really like the idea of running it unsecured like that.

I thought maybe I would be able to run just the experiment server, but when I try to do so with `psiturk-server`, it can't find the `experiment` app:

```
Traceback (most recent call last):
File "/home/cocosci/jhamrick_python/lib/python2.6/site-packages/gunicorn/arbiter.py", line 495, in spawn_worker
worker.init_process()
File "/home/cocosci/jhamrick_python/lib/python2.6/site-packages/gunicorn/workers/base.py", line 106, in init_process
self.wsgi = self.app.wsgi()
File "/home/cocosci/jhamrick_python/lib/python2.6/site-packages/gunicorn/app/base.py", line 114, in wsgi
self.callable = self.load()
File "/home/cocosci/psiturk/psiturk/experiment_server.py", line 42, in load
return util.import_app("experiment:app")
File "/home/cocosci/jhamrick_python/lib/python2.6/site-packages/gunicorn/util.py", line 354, in import_app
__import__(module)
ImportError: No module named experiment
```

I think maybe I'm confused about the way I'm supposed to be using PsiTurk. Is the idea that you should be able to have X-server access to the server that you're running it on, so you can view the dashboard locally/privately and still run the experiment server publicly? What do you think the best option is for me, if I want to run it on a server that has no GUI interface, but still have access to the dashboard? Perhaps it would be possible to add an option to secure the dashboard interface?

Thanks!
Jess

Jay Martin

unread,
Oct 21, 2013, 9:20:50 PM10/21/13
to psi...@googlegroups.com
Hi Jess,

You have the right idea. Basically, launch psiTurk on the external server, so that both the experiment and dashboard servers launch. Yet, as you noticed, this is a major security issue as the dashboard will be publicly exposed. I've been meaning to add authentication to the dashboard (see https://github.com/NYUCCL/psiTurk/issues/21), but haven't gotten around to it. In the meantime, a super hacky way around it is to kill the dashboard port 22361 after you launch psiTurk. This trick should prevent anyone from modifying your configuration or submitting payments.

Thanks for reminding me about this issue. It also reminds me that we need to polish the process of managing remote servers and/or databases.

Jay

jessica....@gmail.com

unread,
Oct 22, 2013, 12:03:17 AM10/22/13
to psi...@googlegroups.com
Hey Jay,

Ok, cool, for now I'll just block the port, then.

It looks like there's some code to support auth in `experiment.py` -- I imagine it wouldn't be too hard to use that as a template for the dashboard, too. If I have time, I may poke at it a bit!

Cheers,
Jess

Jay Martin

unread,
Oct 22, 2013, 12:13:46 AM10/22/13
to psi...@googlegroups.com
This snippet is pretty helpful too (http://flask.pocoo.org/snippets/8/). I'll just need to add login/user text fields to the dashboard. 

jessica....@gmail.com

unread,
Nov 18, 2013, 6:02:35 PM11/18/13
to psi...@googlegroups.com
Hi all,

Coming back to this issue a little bit, I was wondering if you all would be interested in me working on a more substantial contribution regarding the interface between dashboard and experiment, especially when the experiment is on a different machine and/or is using a different server. For example, the default use case (the way PsiTurk is set up currently) is to have the dashboard and experiment run on the same machine, ideally locally. Another use case (which is how I'm currently doing it) is to run the dashboard on the same machine that my experiment is running on, but I'm not running the gunicorn experiment server (I have my server set up to run the flask app itself).

I think it would be cool to think a bit more about what different use cases there are/might be and adjust the interface accordingly to make it really easy to set up regardless of what people are trying to do. For example, if the server is something other than the gunicorn server, the "launch server" button should be disabled, but you should still be able to launch test runs.

I am taking a class on scientific Python this term, and the final project is to either contribute to an open source project or do something that would be useful for your research. A project like this would actually fulfill both of those requirements for me, and I'd really like to contribute more regardless, too.

Anyway, let me know what you think!

Cheers,
Jess

Jay Martin

unread,
Nov 18, 2013, 6:22:49 PM11/18/13
to psi...@googlegroups.com
Hi Jess,

We're very open to contributions. The current plan is to replace the dashboard with a command line interface (see CLI branch), which might address some of your issues, while making it easier for new people to contribute. Take a look at it, and see if you have any thoughts. If you feel that the dashboard is a valuable resource to maintain, we can discuss that option too.

Jay

jessica....@gmail.com

unread,
Nov 18, 2013, 7:05:45 PM11/18/13
to psi...@googlegroups.com
Ah, I see, I had seen that branch but didn't realize it was meant to be a replacement entirely for the dashboard. I personally would give my +1 for the web dashboard interface -- especially in terms of dealing with creating HITs and paying people, it make it way easier that using Turk directly. I also personally get frustrated with CLI interfaces sometimes, because I always forget what all the various options are, but with a GUI interface it's easier to figure out what you're supposed to do.

How is the CLI currently being built? Is it something like:

experiment app --> API of some sort --> CLI

If so, then you could easily have as many front ends as you would like, e.g.

experiment app --> API of some sort --> (1) dashboard
--> (2) CLI
--> (3) etc.

Or were you all thinking of it in a different way?

Cheers,
Jess

Jay Martin

unread,
Nov 18, 2013, 7:10:10 PM11/18/13
to psi...@googlegroups.com
Exactly–There's currently an undocumented API (my fault), which has methods to start/stop servers, pay participants, etc.

jessica....@gmail.com

unread,
Nov 18, 2013, 7:56:07 PM11/18/13
to psi...@googlegroups.com
Jay: you mean the API in `amt_services.py`?

Also, Todd sent me an example of a session using the CLI, but I guess it didn't go through to the google group -- it looks very cool!

I think though what I'm thinking of with respect to the dashboard (or CLI)/experiment server might still apply. Currently my understanding of how it works is:

1. If you want to interface with the experiment server, you have to use the gunicorn server and talk to it through the controller defined in `experiment_server_controller.py`.

2. You can still otherwise use the dashboard/CLI, but you can't do anything with relation to the experiment server.

What I'm thinking of would be something like this:

1. Define a new route in `experiment.py` (e.g. `/status`) that allows someone to ping the server for its status via that route, rather than doing the port check with the controller.

2. Have the API keep track of whether it started the server, or whether the server was already running. If the server was already running, then disable the option to shut it down. Otherwise, you'll still have the experiment controller and can restart/shutdown/etc.

3. Potentially do something like check to see if the experiment server address is local or not. If it's not, then disable the launch option, because then that would require something like SSHing into the remote server to start it, which might be too complicated at least for the moment.

In other words, I think it would be easier to setup/understand if the experiment server was decoupled from the dashboard. The dashboard/CLI should offer the ability to interface with *any* experiment server you might define (as long as it has routes that will let you check the status and stuff), and then also offer the option of launching it's own experiment server (if you don't want to set up your own) -- but it shouldn't necessarily be limited to that.

Similarly, maybe routes like `/data` in the dashboard server should be moved (with authentication enabled) to the experiment server. Technically, the dashboard shouldn't really need to worry about how the data is saved into the database -- the experiment server knows, and so the experiment server should also be the one to extract it. There could still be a command to download the data from the dashboard, but in the background it could query the experiment server instead.

Would something like that sound reasonable? Definitely let me know if this is not at all how you're thinking of structuring things -- I don't want to jump in and mess up the flow you all already have!

Jay Martin

unread,
Nov 18, 2013, 8:19:44 PM11/18/13
to psi...@googlegroups.com
Most of the API I was referring to is currently located in dashboard_server.py, which includes modules like amt_services.

I think your MCV approach sounds very reasonable, and I agree, we need to decouple the concerns of the dashboard/CLI and exp server better. Ideally, the dashboard/CLI should just be a "view," but they're currently acting like a view-controller. Todd is leading much of the refactor, and he might have a different approach in mind. Might be good to get his thoughts on this too...

Todd Gureckis

unread,
Nov 18, 2013, 9:32:40 PM11/18/13
to psi...@googlegroups.com
Well, just to add to this.  Here is the model that I sort of had in mind:

1. ssh to a remove server (or open terminal on your local machine)
2. start 'screen' - a nice program that captures a log of your terminal but can be detached and left running when you log out of ssh
3. open the psiturk CLI (currently, psiturk-shell)
4. configure your hits, start/stop experiment server, check balance, etc..   basically everything current web dashboard does
5. detach from the screen session leaving the server running if you so choose (by essentially not quiting the psiturk shell)
6. come back some time later and ssh to the remove server, reattach to your screen session, continue interactins with psiturk, pay people, etc...

this may fall too close to my own use case and not take into account how other people might find it useful to work.  however, i do think that starting/stopping experiment servers on remote machines AND starting/stopping dashboard/CLI type processes on different machine could add complexity/confusion.

the main advantage of the interactive shell approach is to:
1. provide a more user-friendly reminder of commands, command history, tab completion, etc...
2. make it so it is pretty easy/intuitive to run everything on a remote machine
3. make running remotely or locally be basically the same set of instructions
(4) i was getting worn out making screenshots of the dashboard in the docs!  a text-based interactive shell can be documented really easily and updated! :)


-T

jessica....@gmail.com

unread,
Nov 19, 2013, 4:20:58 PM11/19/13
to psi...@googlegroups.com
Oh, yeah, I totally agree with all your points about the interactive shell approach! :)

What I'm thinking of in terms of separating the experiment server from the dashboard wouldn't really affect that, I think. Let me see if I can explain better how I have my experiment server set up, I think that will maybe make it clearer what I'm thinking about.

Our lab has an offsite server that we use to serve pretty much everything: our lab's website, lab member webpages, and the experiments we run, etc. I *can* run the psiturk experiment (gunicorn) server to serve my experiments, but that's a little weird since I'm running that server on a server that is already configured with Apache and stuff (also, I need to use the main server if I want https, which I need so that browsers will actually display my experiment through the iframe on turk). So it makes more sense for me to not use gunicorn, and just set up the main server with WSGI to point at the psiturk Flask app (i.e. I have an entry point Python file that runs `from psiturk.experiment import app as application`). So, my experiment is technically always online, and only really requires the `experiment.py` part of Psiturk to run. For my use case, there's not really any starting or stopping of the experiment server because (1) it's not started by Psiturk and (2) it's always on in any case.

But, it would be great if Psiturk could still check on the experiment server to make sure it's functioning correctly (which it could do by adding a `/status` like I mentioned) and to run debug subjects on it (which currently you can only do through the dashboard when the gunicorn experiment server is on, or manually if I type in a valid URL, but that's kind of annoying to do). Further, because the experiment is really the only thing tying my running of the dashboard/CLI to an external server (i.e., interfacing with the turk API should work from any machine), it would be awesome if I could just run the CLI on my local machine, have it check the status of my remote experiment server, download data from it, etc.

I think it is fine (and probably best!) for there to be a default behavior of launching a gunicorn server, but if you have a separate server configured, Psiturk should be configurable to understand that and still be useful in running debug subjects/etc. Ideally, users wouldn't even notice a difference and shouldn't have to worry about the potential for it to be on a different server, unless they care to change it (like me). The only different in practice would be just specifying a remote URL for the experiment server, I think, rather than a local one.

I think this is actually similar to how IPython works, too, because the IPython shell/notebook/etc. are just interfaces to the underlying kernel -- which can be running locally (the normal use case) or running remotely.

Todd Gureckis

unread,
Nov 19, 2013, 7:03:00 PM11/19/13
to jessica....@gmail.com, psi...@googlegroups.com
Hey Jess,

I think I like parts of what you are suggesting here, but it might be helpful to explain a bit of the design philosophy we 
had taken when starting this project which isn't explicit in the code but in our lab conversations over many months. ;)

The very core idea to the project was that running/managing a web server is a hassle for many people.  Some (ahem... most) 
people find the idea of installing apache and patching it with fast_cgi or WSGI extensions totally gobbly-gook.  

To avoid all that we wanted to give the users a software program which they run (usually) on their personal computer
which would let them collect data online.  In theory, someone using psiTurk never has to know about apache, WSGI
or even have a always-on server (hosted off "campus" or otherwise), so long as they have a desktop that can stay online
as long as they want to be collecting data.

The concept that we have is of a "player" (similar to a tape player).  When you run psiturk-setup-example it will
give you a basic project with the necessary files (your blank tape).  Launching 'psiturk' in that folder will basically
let you run (i.e., "play") it on AMT.  Like a tape player on your stereo you get controls like the ability to start/stop the server, 
configure things, pay people, etc...  

Under this model you can quickly switch between two experiments (switch tapes).  If you had two folders on your desktop
like 'experiment A' and 'experiment B' lanching psiturk in either folder would allow you to "run" that experiment.   
Shutting it down and moving to another folder allows you to switch (although we'd probably like to allow multiple 
concurrent psiturk processes soon).  This is a little easier than uploading all the HTML/JS to remote machine, etc...  
esp. for novices since everything is basically local (i.e., testing uses same URLs/ files as going live).

Eventually we will provide an "experiment exchange" on the psiturk website where we hope people can download
other people's psiturk-compatible experiments and "run" them (to replicate or to extend the work).  If the API is
relatively stable I think this could really speed up things and give people a nice base of code to begin their own 
projects with.  I'm skeptical of GUI-based experiment builders and think that, practically, just providing lots of 
example code in a common language/format probably is the best help to most scientists.

Re: the secure certificate thing... this is another hurdle for novices.  Recently, I tried to get NYU's IT dept. to give me a
valid SSL certificate for our local server to no avail.  Thus, if we went down that route, my lab would be forced to sign
up with a hosting service, register/pay for our own certificate, get WSGI/Passenger working...  
Again, a lot of gobbly-gook for most psychologists.  I dread explaining all that in the documentation!

To address this, the features/ssl branch we've been working on are attempting to make a "secure ad server" system 
which will allow people using psiTurk to host SSL-signed ads without having to actually have a secure server 
with WSGI/SSL configured (it would host all ads for psiturk users off https://psiturk.org which has been setup with a certificate).   
Ultimately, I hope this is a temporary solution until Amazon makes some change to stop the <iframe> madness.

Anyway, maybe this email is helpful to other people who didn't get to sit around and chat about it in our
lab.  Maybe the motto is something like "removing as many barriers which count as 'web/internet technology'
from the process of running experiments online while also providing many useful features that would be a pain to
code up yourself." There's still a ways to go to make it actually reach that point, but that's the vision...

On the other hand, I think it would be nice if people could use it in a way that works best for them.  Thus, I think 
we're all definitely open to thinking more about separation/modularity issues.  One lesson I think
Jay and I found in your email is that the code really could be more modular, not only for allowing the kind of
configuration you describe, but also because it will allow us to updating things in the future more easily
 (e.g., when something better than Flask comes along, maybe easier to update code to use this if the
code was more separated).   We should definitely strive to balance making it easy/simple but also work
for as many different use cases as possible...

T
-- 
You received this message because you are subscribed to a topic in the Google Groups "PsiTurk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/psiturk/8KbjZ5f4_44/unsubscribe.
To unsubscribe from this group and all its topics, send an email to psiturk+u...@googlegroups.com.
To post to this group, send email to psi...@googlegroups.com.
Visit this group at http://groups.google.com/group/psiturk.
For more options, visit https://groups.google.com/groups/opt_out.

Todd Gureckis

unread,
Nov 20, 2013, 9:13:56 AM11/20/13
to jessica....@gmail.com, psi...@googlegroups.com
I thought about this a bit more last night… 

While the overall philosophy of “removing barriers which count as ‘web/internet technology’” is 
good to have out in public, as you said, there’s no reason why the system can’t ALSO work in the 
way you are suggesting, Jess.  It does have added advantage of enforcing clear modularity between
the API and the experiment server itself.

If you wanted to look into what is required to make the CLI/dashboard work with an experiment 
server hosted on a different machine, it may be useful for multiple people.   My guess is a mode would need to
be added to the CLI/dashboard that removes start_server and stop_server as possible commands/features
and a new config option would need to point to where the server is at (plus the /status type stuff you
described).

It would just be like another “mode” of using psiTurk.  The overall pitch of the project still can be that
you don’t need a WSGI server/SSL cert/etc… (if you don’t already have that stuff or know what it is. 
if you do, then great).

Maybe the even nicest version of it would be not to put it in the config file (another option to explain
to novices) but simply invoking this mode on the command line like:

psiturk —remote-server=myserver.org:5002

On the server side, there probably isn’t an easier way than the 
`from psiturk.experiment import app as application`
you are using (and of course that can vary from system to system depending on which type of
WSGI your hosting service or server is set up to use e.g., Passenger, mod_python, Python 3, etc..)

-T


You received this message because you are subscribed to the Google Groups "PsiTurk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psiturk+u...@googlegroups.com.

jessica....@gmail.com

unread,
Jan 1, 2014, 4:13:51 PM1/1/14
to psi...@googlegroups.com, jessica....@gmail.com
Hi all,

Sorry for the lack of response on this! I got totally bogged down at the end of term. I'll probably be pretty busy for the next month, too, what with the CogSci deadline (and I imagine you all will be as well), but definitely plan to keep contributing after that.

> The very core idea to the project was that running/managing a web server is a hassle for many people.  Some (ahem... most) 
> people find the idea of installing apache and patching it with fast_cgi or WSGI extensions totally gobbly-gook.  
>
>
> To avoid all that we wanted to give the users a software program which they run (usually) on their personal computer
> which would let them collect data online.  In theory, someone using psiTurk never has to know about apache, WSGI
> or even have a always-on server (hosted off "campus" or otherwise), so long as they have a desktop that can stay online
> as long as they want to be collecting data.

I am totally on board with this philosophy! I think it's really important to be able to provide these sorts of tools to people who aren't experts on web deployment. Honestly, even for someone who is familiar with all the details, it's nice to not have to think about them.

How does this work if you want to deploy the experiment on turk, but run it locally? You'd still need a public-facing IP address or URL, right? Maybe that's less difficult at universities where it's more likely you'll get a publicly accessible unique IP -- at MIT, that's what happens, though not at Berkeley. Maybe interfacing with something like localtunnel (http://progrium.com/localtunnel/) would be a solution to that, so people don't have to worry about setting up a static IP.

> Eventually we will provide an "experiment exchange" on the psiturk website where we hope people can download
> other people's psiturk-compatible experiments and "run" them (to replicate or to extend the work).  If the API is
> relatively stable I think this could really speed up things and give people a nice base of code to begin their own 
> projects with.  I'm skeptical of GUI-based experiment builders and think that, practically, just providing lots of 
> example code in a common language/format probably is the best help to most scientists.

That's awesome -- I can't wait for the day when you can just say "I want to replicate X experiment", and be able to run it online in a manner of minutes :-)

I am also kind of skeptical of GUI-based builders. Now that I've had more time to think about it, I actually think that the CLI that you're working on is exactly the right idea -- that will be the most flexible way to interface with PsiTurk. It might still be nice to have some kind of web interface, but working on that should probably not be the focus until the actual API as accessed through the CLI is stable.

> To address this, the features/ssl branch we've been working on are attempting to make a "secure ad server" system 
> which will allow people using psiTurk to host SSL-signed ads without having to actually have a secure server 
> with WSGI/SSL configured (it would host all ads for psiturk users off https://psiturk.org which has been setup with a certificate).   
> Ultimately, I hope this is a temporary solution until Amazon makes some change to stop the <iframe> madness.

That sounds like a great solution. I'm happy that browsers are getting pickier about loading unauthenticated external content, but it's also frustrating in this sort of scenario...

> If you wanted to look into what is required to make the CLI/dashboard work with an experiment 
> server hosted on a different machine, it may be useful for multiple people.   My guess is a mode would need to
> be added to the CLI/dashboard that removes start_server and stop_server as possible commands/features
> and a new config option would need to point to where the server is at (plus the /status type stuff you
> described).
>
> Maybe the even nicest version of it would be not to put it in the config file (another option to explain
> to novices) but simply invoking this mode on the command line like:
>
> psiturk —remote-server=myserver.org:5002

Yeah, that seems like a good way to do it. If this is still something you guys are open to, then I'll write up a design doc/spec or something (probably after the CogSci deadline) and we can iterate on that before I actually make any changes in the code. Also, I see there's been a lot of development in the last month or so that I need to go familiarize myself with :-)

Happy new year!

Cheers,
Jess
Reply all
Reply to author
Forward
0 new messages