Documentation for ProjectTemplate
refers to two types of configuration: "add.config ... Enables project specific configuation (sic) to be added to the global config object" (p. 4). So apparently there's project-specific
configuration and global
configuration. This makes sense to me.
- ProjectTemplate configuration are the settings which alter how
load.project() behaves when executed. For example, whether to have logging enabled.
- Project specific configuration are the settings which make sense only to a particular project, but you would like to change them easily in
munge scripts. For example, you may define
plot_footnote = "My Proj" to control a consistent look and feel for plots.
Both types are stored in the
config object accessible from the global environment. The function
project.config() will display the current configuration, including project specific configuration.
It seems to me the documentation is actually talking about three kinds of configuration, but ProjectTemplate actually implements two. Here is how I understand the language:
- Global configuration should apply to all projects using ProjectTemplate. But either there is no global configuration or it's not documented on either source cited above.
- Project-specific configuration actually is what's stored in an individual project's config/global.dcf file. The documentation variously refers to this as "global configuration" or "ProjectTemplate configuration."
- Project-specific options are what the documentation refers to as project-specific configuration. I say this because the documentation associates this with the add.config function, which allows the user to examine specific options that are being used for a particular run (or instance) of the project.
Perhaps an example will make this more clear.
Suppose someone, or perhaps even a team of researchers, decides to use ProjectTemplate for all their projects using R. Early on they decide to always use a single graphics idiom, namely ggplot2, for all their work. The standard default libraries in the global/config.dcf file are reshape2, plyr, tidyverse, stringr, and lubridate. So the user or team leader changes the libraries line in a truly global (i.e. one located outside any individual project's file space) config.dcf file to:
libraries: reshape2, plyr, tidyverse, stringr, lubridate, ggplot2
Alternatively, there could be an executable R script, again located outside the file space of any individual project but run whenever load.project() is run in any and all of this user's individual projects , that would have code like the following:
libraries = paste(config$libraries, "ggplot2", sep = ", ")
This second approach has the advantage of incorporating changes in the default libraries list without modifying the global configuration file. One of its big disadvantages is that because it can be used repeatedly, it may make maintaining the project very difficult because the global configuration can be modified multiple times and in multiple places.
Now suppose our user is working on one project that makes heavy use of time-series data. So when she starts working on the project, the user always wants the xts project loaded when working in this project. So with the current nomenclature and implementation, she would change the libraries line in the project's config/global.dcf file from:
libraries: reshape2, plyr, tidyverse, stringr, lubridate
libraries: reshape2, plyr, tidyverse, stringr, lubridate, xts
Notice several things here. First, since the libraries line continues to exist within the project in the default form, it is overriding any implementation of a truly global configuration. This will be true of all options specified in the config/global.dcf file. Second, when the user adds xts to the default list, the ggplot2 entry is overridden. So this might be a good place to use add.config() as shown above in that it does not destroy any earlier or higher-level options:
libraries = paste(config$libraries, "xts", sep = ", ")
But this runs into some other problems. If genuinely project-specific options are to be set this way, what's the config/global.dcf file for? In its current implementation it's still going to override any options set by a genuinely global option. Furthermore, the two methods -- a dcf file with option:value pairs and a programatic function call -- make things unnecessarily complex. While there's a place for a programmatic method, this example is not one. I'd prefer a solution that uses a project-specific dcf file with option:value pairs and that has some way to distinguish instructions to override global options versus instructions to augment global options. Individual projects' config/global.dcf would be replaced with a more appropriately named config/project_options.dcf file, which would have only entries for options that are being overridden or augmented in this project. For example, if it has a line like:
the only library loaded when the project loads would be xts. But if instead it had a different option with the same value,
it would simply append "xts" to the list of libraries specified in the actually global configuration file.
Project-specific Options & Run-specific Specification
Besides the standard options, an individual project may need its own options. The add.config() function and the config$<option> mechanism allow this. But while these programmatic methods may be necessary of implementing
new, project-specific options, they make things overly overly complex for specifying such options; I think specification should follow the practices used for global and project-specific options. Let's modify the example for project-specific configuration given on the ProjectTemplate web page
, with the following in the project's lib/globals.R file:
With the currently implemented mechanism, this will set the header to the default, "Draft", and allows the user to override the default with:
Good enough, but I'd also like to see a method that more closely resembles how global and project-specific options are selected elsewhere. I don't see any way to get around programming extra options with add.config(), unless ProjectTemplate were to adopt the convention that specifying an unknown (i.e., unprogrammed) option will add the option to config and set it to NULL (e.g. config$header=NULL). This would be dangerous because misspellings could go undetected. But once the programming is available, perhaps a file such as lib/run_config.cfg, could have values for specifying run-specific values of project-specific options. Continuing with the header example, the run_config.cfg might have an entry like
header: "Draft only; please do not quote or cite without permission"
for most early runs of the project's analyses and then, when the analysis comes closer to the final product change this to
header: "Draft, please comment"
and then finally, the analyst would issue the following command
This would require ProjectTemplate to read the run_config.cfg file and set values for the different options. If we trust what's in the file, this behavior could create and initialize configuration options. In other words, ProjectTemplate would add a module to read the file and use add.config() to create configuration options and set them.
I realize this has gone beyond expressing my frustration at the language the documentation uses, and perhaps I'm missing some things. But I hope that at a minimum I'm pointing out some ambiguities in the language used to describe the current implementations. Beyond this, maybe this points to an approach more consistent with other tools having similar issues (e.g., how LaTeX uses a search path for different customizations) and make ProjectTemplate more powerful.