Confusion over configuration terminology

8 views
Skip to first unread message

Marshall Feldman

unread,
Jun 8, 2018, 4:58:13 PM6/8/18
to ProjectTemplate
Documentation for ProjectTemplate refers to two types of configuration: "add.config ... Enables project specific configuation (sic) to be added to the global config object" (p. 4). So apparently there's project-specific configuration and global configuration. This makes sense to me.

But the ProjectTemplate web site gives a slightly different explanation:
    • ProjectTemplate configuration are the settings which alter how load.project() behaves when executed. For example, whether to have logging enabled.
    • Project specific configuration are the settings which make sense only to a particular project, but you would like to change them easily in src or munge scripts. For example, you may define plot_footnote = "My Proj" to control a consistent look and feel for plots.

Both types are stored in the config object accessible from the global environment. The function project.config() will display the current configuration, including project specific configuration.


It seems to me the documentation is actually talking about three kinds of configuration, but ProjectTemplate actually implements two. Here is how I understand the language:
  • Global configuration should apply to all projects using ProjectTemplate. But either there is no global configuration or it's not documented on either source cited above.
  • Project-specific configuration actually is what's stored in an individual project's config/global.dcf file. The documentation variously refers to this as "global configuration" or "ProjectTemplate configuration."
  • Project-specific options are what the documentation refers to as project-specific configuration. I say this because the documentation associates this with the add.config function, which allows the user to examine specific options that are being used for a particular run (or instance) of the project.
Perhaps an example will make this more clear.

Global Configuration
Suppose someone, or perhaps even a team of researchers, decides to use ProjectTemplate for all their projects using R. Early on they decide to always use a single graphics idiom, namely ggplot2, for all their work. The standard default libraries in the global/config.dcf file are reshape2, plyr, tidyverse, stringr, and lubridate. So the user or team leader changes the libraries line in a truly global (i.e. one located outside any individual project's file space) config.dcf file to:

libraries: reshape2, plyr, tidyverse, stringr, lubridate, ggplot2

Alternatively, there could be an executable R script, again located outside the file space of any individual project but run whenever load.project() is run in any and all of this user's individual projects , that would have code like the following:

add.config(
  libraries = paste(config$libraries, "ggplot2", sep = ", ")
This second approach has the advantage of incorporating changes in the default libraries list without modifying the global configuration file. One of its big disadvantages is that because it can be used repeatedly, it may make maintaining the project very difficult because the global configuration can be modified multiple times and in multiple places.


Project-specific Configuration
Now suppose our user is working on one project that makes heavy use of time-series data.  So when she starts working on the project, the user always wants the xts project loaded when working in this project. So with the current nomenclature and implementation, she would change the libraries line in the project's config/global.dcf file from:

libraries: reshape2, plyr, tidyverse, stringr, lubridate
to
libraries: reshape2, plyr, tidyverse, stringr, lubridate, xts

Notice several things here. First, since the libraries line continues to exist within the project in the default form, it is overriding any implementation of a truly global configuration. This will be true of all options specified in the config/global.dcf file. Second, when the user adds xts to the default list, the ggplot2 entry is overridden. So this might be a good place to use add.config() as shown above in that it does not destroy any earlier or higher-level options:

add.config(
  libraries = paste(config$libraries, "xts", sep = ", ")
 
But this runs into some other problems. If genuinely project-specific options are to be set this way, what's the config/global.dcf file for? In its current implementation it's still going to override any options set by a genuinely global option. Furthermore, the two methods -- a dcf file with option:value pairs and a programatic function call -- make things unnecessarily complex. While there's a place for a programmatic method, this example is not one. I'd prefer a solution that uses a project-specific dcf file with option:value pairs and that has some way to distinguish instructions to override global options versus instructions to augment global options. Individual projects' config/global.dcf would be replaced with a more appropriately named config/project_options.dcf file, which would have only entries for options that are being overridden or augmented in this project. For example, if it has a line like:

libraries: xts

the only library loaded when the project loads would be xts. But if instead it had a different option with the same value,

project_libraries: xts
 
it would simply append "xts" to the list of libraries specified in the actually global configuration file.


Project-specific Options & Run-specific Specification
Besides the standard options, an individual project may need its own options. The add.config() function and the config$<option> mechanism allow this. But while these programmatic methods may be necessary of implementing new, project-specific options, they make things overly overly complex for specifying such options; I think specification should follow the practices used for global and project-specific options. Let's modify the example for project-specific configuration given on the ProjectTemplate web page, with the following in the project's lib/globals.R file:

add.config(
  header = "Draft"

With the currently implemented mechanism, this will set the header to the default, "Draft", and allows the user to override the default with:

>load.project(header="Final Report")

Good enough, but I'd also like to see a method that more closely resembles how global and project-specific options are selected elsewhere. I don't see any way to get around programming extra options with add.config(), unless ProjectTemplate were to adopt the convention that specifying an unknown (i.e., unprogrammed) option will add the option to config and set it to NULL (e.g. config$header=NULL). This would be dangerous because misspellings could go undetected. But once the programming is available, perhaps a file such as lib/run_config.cfg, could have values for specifying run-specific values of project-specific options. Continuing with the header example, the run_config.cfg might have an entry like

header: "Draft only; please do not quote or cite without permission"

for most early runs of the project's analyses and then, when the analysis comes closer to the final product change this to

header: "Draft, please comment"

and then finally, the analyst would issue the following command

>load.project(header="Final Report").

This would require ProjectTemplate to read the run_config.cfg file and set values for the different options. If we trust what's in the file, this behavior could create and  initialize configuration options. In other words, ProjectTemplate would add a module to read the file and use add.config() to create configuration options and set them.

I realize this has gone beyond expressing my frustration at the language the documentation uses, and perhaps I'm missing some things. But I hope that at a minimum I'm pointing out some ambiguities in the language used to describe the current implementations. Beyond this, maybe this points to an approach more consistent with other tools having similar issues (e.g., how LaTeX uses a search path for different customizations) and make ProjectTemplate more powerful.
 

Kenton White

unread,
Jun 19, 2018, 1:26:27 PM6/19/18
to project...@googlegroups.com
HI Marshall,

Same as the previous request — could you add this as an issue to the github repository https://github.com/KentonWhite/ProjectTemplate.  Thanks!

Kenton White

--
You received this message because you are subscribed to the Google Groups "ProjectTemplate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projecttempla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages