Advice needed on working via remote server

78 views
Skip to first unread message

Luke Salvato

unread,
May 24, 2021, 4:19:47 PM5/24/21
to davi...@googlegroups.com
Hi D-RUG,

I am hoping to start working on one of the UC Davis cloud servers, using R Studio. Does anyone have experience with this? How to start, who to contact, etc? I reached out to the FARM cluster (farm...@ucdavis.edu) just now, but am waiting to hear back. (I'm a phd student in plant sciences).

Thanks to much help from d-rug with my R skills, I'm now working with large spatial data sets on a new project and I'm eager to learn the skills for working via a remote server. Additionally, my laptop was stolen last week and for now I'm back to working on my older macbook air with 8gb of ram, which isn't enough to work with my data set - I'm really hoping to figure out this new skill set quickly so I can get back to work!

Please let me know if you have any guidance on how to proceed. This server thing is totally new to me...

Thanks for everything,
Luke


Rathin Raval

unread,
May 24, 2021, 5:36:54 PM5/24/21
to davi...@googlegroups.com
I can help you set up a remote rstudio server that you can connect via a browser through ssh

--
Check out our R resources at https://d-rug.github.io/
And please post questions/comments at https://d-rug.discourse.group/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/davis-rug/CADkWfV9ECZsy18hajOrX-r%3DK%3DdkkQi5gtW5uGKLJr0u1CyLBtA%40mail.gmail.com.


--
Thanks

Rathin Raval

CONFIDENTIALITY: This e-mail (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited.  If you receive this e-mail in error, please notify the sender and delete this e-mail from your system.

Lauren Mabe

unread,
May 24, 2021, 5:36:57 PM5/24/21
to davi...@googlegroups.com
Hi Luke, 
I'm a PhD student in Geography and have had similar questions about the FARM server in the past. If I recall correctly, you will need to get in contact with a lab/PI that has bought-in to the FARM server in order to access it.

If you (like me) are unable to get access to FARM, you may be able to get your computing needs done through a cloud service such as Amazon Web Services or Microsoft Azure. I use Microsoft Azure Batch for parallel computing using the doAzureParallel package and found it to be relatively simple to use (there's an excellent tutorial to install and use on github). It does cost money, but if you set your configure file up to use low priority nodes, it can reduce the cost to a reasonable amount (depending on the job size, that is)

Hopefully someone more knowledgeable on D-RUG can give you more info on FARM. I actually hope I'm wrong about the "only get access through a lab" thing, because I would love to use FARM cluster as well!

Thanks, 
Lauren Mabe


--
Check out our R resources at https://d-rug.github.io/
And please post questions/comments at https://d-rug.discourse.group/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/davis-rug/CADkWfV9ECZsy18hajOrX-r%3DK%3DdkkQi5gtW5uGKLJr0u1CyLBtA%40mail.gmail.com.


--
Lauren Mabe
Ph.D. Student | Geography Graduate Group
University of California, Davis
she/her

Lisa Rosenthal

unread,
May 24, 2021, 5:44:11 PM5/24/21
to davi...@googlegroups.com
I'm super interested in this whole thread here too. I actually just emailed FARM to get an account set up. I was under the impression that if your lab is in the College of Ag & Env (which includes Plant Sciences, LAWR, and Plant pathology--what I'm in) that you have free access to FARM, is that not true? I definitely know that my PI doesn't invest in anything related to high performance computing.

Danielle Stevens

unread,
May 24, 2021, 5:52:21 PM5/24/21
to davi...@googlegroups.com
I can provide some insight into this.

Yes and no. If you/your lab is in Agi Sci, you can get some access to FARM for free but you are limited to what nodes to submit your jobs to. To get full access to FARM including the 'big mem' node, your lab has to buy in (decent about of $$$). I agree with Lauren that if you need decent computing power (for short time period), AWS may be the way to go. I know with AWS you can run both Jupyter notebooks and Rstudio Server. No idea for FARM though for that.

I'm familiar with setting up Rstudio server with ddns for remote access on a desktop but that's the limit of my knowledge on it. 

Best of luck,
Dani



--
Danielle (Dani) Stevens
Ph.D. Candidate | Integrative Genetics & Genomics
Coaker Lab | UC Davis Plant Pathology

Karen Atkins

unread,
May 24, 2021, 6:00:42 PM5/24/21
to davi...@googlegroups.com
Hi All,
Another option that I have used is the HPC clusters that the engineering school has. If you are trying to get access, keep in mind that it doesn't have to be your PI. For example, I got access to FARM via a coauthor who bought in. Network!

I also use R (and brms, Lisa) on the clusters. Let me know if you need sample bash scripts etc.

On Mon, May 24, 2021 at 2:44 PM Lisa Rosenthal <lisami...@gmail.com> wrote:


--
Karen Atkins
Ph.D. Candidate
Hydrologic Sciences Graduate Group
University of California Davis
Pronouns: She/her/hers

Lisa Rosenthal

unread,
May 24, 2021, 6:00:53 PM5/24/21
to davi...@googlegroups.com
Thanks Danielle, AWS then might be what I'll ultimately need to use. Assuming you used AWS, did you find any tutorials in particular that were helpful in getting you started?

Michael Culshaw-Maurer

unread,
May 24, 2021, 6:05:56 PM5/24/21
to davi...@googlegroups.com
Hi all,

If anyone here is interested in getting stuff started on the FARM, I wrote a post on some basics to get you up and running: https://mcmaurer.github.io/farm-cluster-intro/. Hope it’s helpful to anyone who needs FARM access.

I would also plug a free cloud computing platform called CyVerse (full disclosure, I work for CyVerse now). There are plenty of ways to get analyses running on there, including cloud RStudio or Jupyter instances, and I’d be happy to offer some help to anyone who wants to get stuff going there.

Cheers,

Michael


Michael Culshaw-Maurer
Grad Group in Ecology
Rosenheim/Schreiber Labs
Briggs 320

Luke Salvato

unread,
May 24, 2021, 6:08:29 PM5/24/21
to davi...@googlegroups.com

Pamela L Reynolds

unread,
May 24, 2021, 6:44:05 PM5/24/21
to davi...@googlegroups.com

In case useful to other HPC-seekers, here’s a distilled list of resources (some are duplicates from those shared in this thread – thanks all!).

Other long-format guides re: remote computing that may be worth checking out:

·         https://github.com/hpc-carpentry/hpc-intro

·         https://github.com/hpc-carpentry/hpc-intro

·         https://kaust-vislab.github.io/introduction-to-conda-for-data-scientists/

·         https://github.com/kaust-vislab/introduction-to-conda-for-data-scientists

 

For AWS, there were 2 introductory workshops last summer, recording links on the DataLab workshop archive: https://datalab.ucdavis.edu/archive/ The UCD campus AWS contact is a former Aggie, Kevin Murikoshi.

 

Re: FARM, it used to be possible for non-CAES researchers who don’t have funding to submit jobs - but the’d only run if  there is down time and all other jobs have been met. Hence, I support Karen’s networking suggestion!

 

---

Pamela L. Reynolds, PhD

Associate Director

DataLab: Data Science and Informatics

University of California, Davis

plrey...@ucdavis.edu
https://datalab.ucdavis.edu

Lisa Rosenthal

unread,
May 24, 2021, 6:48:20 PM5/24/21
to davi...@googlegroups.com
Thanks Pamela! This was a really quick and useful thread! I just got me a sponsor, so yay networking. 

Matthieu Stigler

unread,
May 24, 2021, 8:54:53 PM5/24/21
to davi...@googlegroups.com
Hi everyone

I am surprised this is all so complicated. In my (previous) department, Ag and resource Economics (ARE), the admin had set up Rstudio Server on the internal department servers. Every PhD student was able to login to Rstudio on the server in their web browser, and did not need any knowledge of ssh/rsync/bash/slurm to use the full capacity of the server (though believe me everyone picked up quickly the `top` command to start fighting over CPU usage). Ok, transferring files would require some more work, but using Rstudio's built-in system terminal, one could even use terminal commands on the server without having to figure out how to ssh into it. 

Given the apparently high demand for this, it would be worth asking your department if a similar workflow can be used, setting up Rstudio server either on an internal server they own, or using FARM? It seems though now there are as many solutions as departments/labs?

@Michael, your tutorial (https://mcmaurer.github.io/farm-cluster-intro/)  seems to indicate that Rstudio can't be used on the FARM server? The Rstudio Server version is super easy to install on AWS/Google Cloud, is it not possible to do the same on FARM too?

Best,

Matthieu

Michael Culshaw-Maurer

unread,
May 24, 2021, 9:12:30 PM5/24/21
to davi...@googlegroups.com
Hi Matthieu,

I’m certainly not an expert on this, but I believe the FARM doesn’t allow for RStudio cloud instances, and I can think of several reasons for this. The FARM is a pretty big cluster with a ton of users, and as you saw with your departmental server, computational resources get gobbled up very quickly. To deal with this, the FARM uses a job scheduler called SLURM which manages submitted jobs, balances work being done across nodes of the cluster, etc.

Even if you could run an RStudio cloud instance on the FARM, you would have to know how many resources to request and for how long. To do an interactive RStudio session, you could potentially burn up a lot of resources that could be used on your own computer. In other words, you might end up using a racecar to go pick up groceries, and burn a lot of gas doing so.

The way I’ve used the FARM, which seems to make the best use of interactive IDEs like RStudio and the computational power of the FARM, is to develop scripts on my computer in a GitHub repository, and then have that repo on the FARM as well. Larger files like data and fitted models get moved around with rsync, so they don’t make the GitHub repo too massive. I will usually test models on a small subset of data on my computer, and when I’m ready to run the whole thing, which would take a long time on my computer, I do that step on the FARM.

The other reason for using RStudio cloud instances is ease of use, as you don’t have to set anything up on your own computer. I think this can be really valuable, but is, to my mind, outside the scope of the FARM’s goals.

All in all, if you want to use the computational power of the FARM, I think it’s worth saving it for the most intensive steps of your workflow rather than attempting to do the whole thing there. This minimizes the use of shared computational resources AND makes it more likely your intensive jobs will run on the FARM sooner.

Happy to discuss any/all of this further, or to discuss other cloud computing avenues!

Cheers,

Michael



Michael Culshaw-Maurer
Grad Group in Ecology
Rosenheim/Schreiber Labs
Briggs 320

Luke Salvato

unread,
May 25, 2021, 3:27:55 PM5/25/21
to davi...@googlegroups.com
Thanks everyone for your inputs and for the nice discussion

Based on Michael's points, which I learned a lot from (thanks!), it seems the FARM is a rather 'high-end' solution for my current needs.  

I was hoping there would be a middle-ground between using my own laptop and using more complex high-performance machines like FARM. The solution at the Ag Econ Department, making Rstudio Server available to every PhD student, seems very streamlined and would allow me (and many other students) to quickly take advantage of server capacities without too much cost in time spent learning command line tools etc. It seems there could be a lot learned from the different department/labs use of servers. Pamela, I wonder if the Data Lab could initiate such discussion among departments, and promote a set of recommendations and best practices to make computing power easily available to students, possibly following the example of the Ag Econ Department workflow? It could make computing power a lot more equitable and accessible to a wider range of students, especially because graduate students are typically required to provide their own laptop for work. 

Michael, I looked into CyVerse and it seems like a great option and a cool idea. Thanks for sharing. 

Thanks,
Luke

Reply all
Reply to author
Forward
0 new messages