Re: [knitr] setwd() globally for all chunks?

5,350 views
Skip to first unread message

Yihui Xie

unread,
Jun 15, 2012, 1:11:38 AM6/15/12
to Roey Angel, kn...@googlegroups.com
setwd() is bad, dirty, ugly... you should never use it inside your Rnw
document; use this convention: data files go with source files, and
always start R in the directory of the source files. Whenever you want
to manipulate files, they are assumed to be under the same directory
of your source (e.g. Rnw documents). Then you can always use relative
paths and you will never need to setwd(). Using setwd() contradicts
with the principle of reproducibility, e.g. you use setwd('foo/bar/')
and the directory may not exist in other people's computers. See FAQ
7: https://github.com/yihui/knitr/blob/master/FAQ.md

1. RStudio always use the directory of the Rnw file as the working directory;
2. opts_chunk is supposed to set chunk options, and working directory
is NOT a chunk option;
3. dependson is completely irrelevant here; have you read the
documentation? http://yihui.name/knitr/options

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


On Thu, Jun 14, 2012 at 6:58 AM, Roey Angel <an...@mpi-marburg.mpg.de> wrote:
> Hi,
> I'm just starting to shift from Sweave to Knitr so this might be basic.
> I'm running a knitted code from a different directory than the one holding
> the files needed for the analysis by R.
> As it is now I'm forced to call setwd() in every chunk in order to direct R
> to the right working directory (even though it's the same directory
> throughout).
> Isn't there a way to have a setwd() call apply for the whole document?
> I tried:
> 1. setting the work dir in RStudio before running the code
> 2. placing the setwd() call in the same chunk as opts_chunk$set
> 3. using dependson="chunk-with-setwd()-call"
>
> but none worked.
>
> Any suggestions?
>
> Thanks
> Roey
>
>

Roey Angel

unread,
Jun 15, 2012, 6:22:48 AM6/15/12
to kn...@googlegroups.com, Roey Angel
Thanks Yihui and Michael,
Well I typically dont hand out my .rnw files (only the pdf) but all right, I'll use relative paths then.
as for dependson, I assumed the dependent chunk gets everything from the chunk it depends on (objects, functions and paths).

Cheers
Roey

Yihui Xie

unread,
Jun 15, 2012, 11:22:19 AM6/15/12
to Roey Angel, kn...@googlegroups.com
No, that is not what dependson means. Please read the manual (section
3.3): https://github.com/downloads/yihui/knitr/knitr-manual.pdf

You do not need to set the dependson option in order to get objects
from previous chunks; all the chunks are evaluated in the same R
session, so previous objects are naturally available to later chunks.

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


Jon Keane

unread,
Jun 15, 2012, 11:47:29 AM6/15/12
to Yihui Xie, Roey Angel, kn...@googlegroups.com
To chime in here, I completely understand and have bought into the
thou-shalt-not-setwd(). But I have recently run into a situation where
I really wish there was some way to get knitr documents from elsewhere
within a directory structure. Here's the use case that really got me:

Working on my dissertation proposal I have a huge document with many
sub .rnws (that's fine, I just used \include{} in one master tex file
so each section was segregated enough). For each section I generally
have one r script that imports data, cleans it up, fits models (let's
call this setup.r), and then a bunch of individual r scripts for each
plot. When I went to make slides for my defense I wanted to use the
setup.r for each section in place so I didn't have to copy it along
with all of the data into the slides folder, just to use it in the
slides. Allowing for something like this would of course make the
slides a little bit harder to reproduce (they would depend on a
directory structure elsewhere), but it makes the entire project more
coherent (and possible reproducible as well?): the data and processing
scripts are all in a singel place, no need to diff each to see if
there are any differences between them. Additionally if the data is
nontrivially large, having multiple copies of it laying around could
be problematic.

I've run into something similar a few other times, and will admit this
particular use case might actually be an edge case. I'm not sure
there's a way to implement something like this without making it very
tempting for others to set paths like
/root/directories/that/will/be/deleted/foo.R

-Jon

Yihui Xie

unread,
Jun 15, 2012, 11:53:27 AM6/15/12
to Jon Keane, Roey Angel, kn...@googlegroups.com
If there is a good justification, I will certainly try to support it.
I will think about your case later. Thanks!

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


Yihui Xie

unread,
Jun 23, 2012, 6:32:16 PM6/23/12
to Jon Keane, Roey Angel, kn...@googlegroups.com
Please see if the package option 'root.dir' helps:
https://github.com/yihui/knitr/issues/277#issuecomment-6528846

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


Roey Angel

unread,
Jun 25, 2012, 4:32:14 AM6/25/12
to kn...@googlegroups.com, Jon Keane
Many thanks for the effort Yihui, although I couldn't get it to work on my machine.

These are my knitr options (pretty much copied from what you used for the knitr manual):

%% Knitr option
<<Knitr_setup, include=FALSE, cache=FALSE>>=
opts_chunk$set(fig.path='figures/figure-', cache.path='cache/cache-', error=T, warning=T, fig.align='center', fig.show='hold', fig.pos="H", fig.lp="Figure: ", par=TRUE)
# set code/output width to be 90
options(replace.assign=TRUE, width=90)
# set root dir
opts_knit$set(root.dir = "/media/Documents/Work/H2O.SIP/Pyrosequencing/pyrosequencing.data/RTL2011.08/Analysed.data/BAC/Tflows/BAC_my-uni/")
# tune details of base graphics
knit_hooks$set(par=function(before, options, envir){
if (before && options$fig.show!='none') par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3)
})
@

Then my chunk is

<<hist_1, echo=FALSE, fig=TRUE, cache=TRUE, fig.width=5, fig.height=3, fig.cap="Sequence length distribution: extracted flows.">>=
data.file <- "RTL2011.08.shhh.fasta.summary"
seq.summ <-read.table(data.file,header = T)
plot.seq.summ(seq.summ) # call plotting function
@

but knitting the document gives the error:


also, if I add print(getwd()) to the chunk, the output is still the folder where the .rnw doc is.

Am I missing out on something here?

Thanks again
Roey


R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8     
 [2] LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8      
 [4] LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=en_US.UTF-8  
 [6] LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=C               
 [8] LC_NAME=C                
 [9] LC_ADDRESS=C             
[10] LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics
[3] grDevices utils   
[5] datasets  methods 
[7] base    

loaded via a namespace (and not attached):
[1] tools_2.15.1

Yihui Xie

unread,
Jun 25, 2012, 10:13:15 AM6/25/12
to Roey Angel, kn...@googlegroups.com, Jon Keane
Did you install the development version from Github? You need to
library(knitr) before you report sessionInfo().

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


Roey Angel

unread,
Jun 25, 2012, 1:43:30 PM6/25/12
to kn...@googlegroups.com, Roey Angel, Jon Keane
Now I did and it works.
(I saw a newer version than what I had in CRAN and thought that was it).

Thanks

Yihui Xie

unread,
Jun 25, 2012, 1:51:22 PM6/25/12
to Roey Angel, kn...@googlegroups.com, Jon Keane
Great. Whenever I say something should work now, I usually mean the
feature was just added to the development version.

Regards,
Yihui
--
Yihui Xie <xiey...@gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


Charlotte Récapet

unread,
Oct 5, 2016, 6:29:14 AM10/5/16
to knitr, an...@mpi-marburg.mpg.de
Well, is setwd() itself the culprit, or rather absolute paths?
Using absolute paths is definitely the last thing to do in any script, unless you are planning to work alone and on the same OS all your life. However, organizing your projects beyond putting all data, output and source files in the same directory is absolutely necessary in most large projects. That's the whole point of using Rstudio projects. Thus, a better integration between Rstudio and knitr to be able to retrieve the project directory from a knitr script instead of having to put all your knitr documents in the project root directory would definitely be helpful. These guys wrote something along these lines (https://github.com/geneorama/geneorama/blob/master/R/set_project_dir.R), but you still have to enter the project name yourself, because knitr cannot detect from which project it was launched.
Does that sound like an useful and implementable feature?
Reply all
Reply to author
Forward
0 new messages