JMLR special Topic on Causality
Large-scale Experiment Design and Inference of Causal Mechanisms
Deadline: March 15, 2014
Please submit your papers to JMLR.org and send email with your SUBMISSION NUMBER to the guest editors at causality <at> chalearn <dot> org
The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. What affects your health? the economy? climate changes? The gold standard to establish causal relationships is to perform randomized controlled experiments. However, experiments are costly while non-experimental "observational" data collected routinely around the world are readily available. Unraveling potential cause-effect relationships from such observational data could save a lot of time and effort by allowing us to prioritize confirmatory experiments. This could be complemented by new strategies of incremental experimental design combining observational and experimental data.
Much of machine learning has been so far concentrating on analyzing data already collected, rather than collecting data. While experimental design is a well-developed discipline of statistics, data collection practitioners often neglect to apply its principled methods. As a result, data collected and made available to data analysts, in charge of explaining them and building predictive or causal models, are not always of good quality and are plagued by experimental artifacts. In reaction to this situation, some researchers in machine learning have started to become interested in experimental design to close the gap between data acquisition or experimentation and model building. In parallel, researchers in causal studies have started raising the awareness of the differences between passive observations, active sampling, and interventions. In this domain, only interventions qualify as true experiments capable of unraveling cause-effect relationships.
This special topic will include methods of experimental design, which involve machine learning in the process of data collection. Experiments require intervening on the system under study, which is usually expensive and sometimes unethical or impossible. Changing the course of the planets to study the tides is impossible, forcing people to smoke to study the influence of smoking on health is unethical, modifying the placement of ads on web pages to optimize revenue may be expensive. In the latter case, recent methods proposed by Léon Bottou and others involve minimally perturbing the process with small random interventions to collect interventional data around the operating point and extrapolate to estimate the effect of various interventions. Presently, there is a profusion of other algorithms being proposed, mostly evaluated on toy problems. One of the main challenges in causal learning consists in developing strategies for an objective evaluation. This includes, for instance, methods how to acquire large representative data sets with known ground truth. This, in turn, raises the question to what extent the regularities observed in these data sets also apply to the relevant data sets where the causal structure is unknown because data sets with known ground truth may not be representative.
Specific areas relevant to the special topic include, but are not limited to:
a. Methods to discover causal structure from data and to perform causal inference (e.g., estimate causal effects, predict effects of actions, produce most probable causal explanations, perform inference with counter-factuals, etc.). Methods based on the use of multiple types of data (e.g., observational, experimental, case control) and methods based on combining knowledge (e.g., in the form of constraints or prior beliefs) and data, are encouraged. Such methods may be based on Bayesian Networks and other Probabilistic Graphical Models, Markov Decision Processes, Structural Equation Models, Propensity Scoring, Information Theory, Granger Causality, or other appropriate frameworks. New methods of experimental design capitalizing on massive amounts of available observational data and minimal interventions. Methods of pseudo-experiments, quasi-experiments, and natural experiments.
b. Theory:
- Identifiability of causal relationships from observational data or a combination of observational and experimental data.
- Definitions of causality bridging the gap between data generative definitions, interventional definitions, and counterfactuals.
- Formal criteria (e.g., statistical tests of significance of causal relationships, confidence intervals, model scoring measures.) for causal model selection.
- Properties (e.g., soundness/consistency, stability, sample efficiency, computational efficiency) of existing and novel causal discovery methods.
- Formal connections relevant to experiment design and causal discovery among diverse fields such as Statistics, Artificial Intelligence, Decision Theory, Econometrics, Markov Decision Processes, Control Theory, Operations Research, Planning, etc.
c. Assumptions for causal discovery. Theoretical and empirical study of:
- Study of violations of typical assumptions for causal discovery (e.g., Causal Faithfulness Condition, Causal Markov Condition, Causal Sufficiency, causal graph sparseness, linearity, specific parametric forms of data distributions, etc.).
- Prevalence and severity of violations of assumptions and study of worst-case and average-case effects of such violations.
- Novel or modified assumptions and their properties.
d. Evaluation methods, including the study of appropriate performance measures, research designs, benchmarks etc. to empirically study the performance and pros and cons of experimental design and causal discovery methods.
e. Real-world applications and benchmarking of experimental design and causal discovery algorithms, including rigorous studies of highly innovative software environments.
Guest Editors:
Isabelle Guyon, ChaLearn, Berkeley, California, USA.
Alexander Statnikov, New York University, New York, USA.
For further instructions about the submission procedure please read the JMLR policies or send an email to the special topic guest editors to causality <at> chalearn <dot> org.
Recommendations to competitors of the cause-effect pairs challenge invited to write a JMLR paper:
The papers will be judged according to to following criteria:
(1) Performance in the challenge,
(2) Novelty/Originality,
(3) Sanity (correct proofs, good experiments),
(4) Insight, and
(5) Clarity of presentation.
Papers merely describing the steps taken to produce a challenge entry will not be judged favorably. Please include in your submissions:
- The choices and advantages of the methods employed should be supported by a literature overview and qualitative and quantitative comparisons with other methods on the data of the challenge and possibly other data.
- The various building blocks of the presented methods should be analyzed separately and key novel elements contributing to boosting performance significantly should be singled out.
- The authors are also encouraged to motivate new approaches in a principled way and draw insights that go beyond the framework of the challenge.
JMLR is a very selective publication and your paper will undergo a regular journal review. Your chances of acceptance will be increased if you:
- clearly motivate your approach from a practical and theoretical standpoint
- present a consistent set of experiments (using the development data) showing a significant advantage over other methods
- cite your final evaluation results in the challenge
- make sure that your paper is well organized, well written, with good references, figures, and tables
We recommend not to exceed 20 pages.