Supplementary cause-effect pair challenge data

35 views
Skip to first unread message

Isabelle

unread,
May 21, 2013, 2:29:47 AM5/21/13
to causalitychallenge
We are providing two new training sets of artificial data (each of
~6000 pairs) to address the data normalization problem pointed out by
one participant: there was in the original data an imbalance in mean
and scale of variables and number of quantization levels across the 4
classes (A->B, B->A, A|B, A-B).

- SUP1data: Only numerical continuous variables, normalized as
follows: they were standardized (subtracted mean and divided by
stdev), multiplied by 10000 and rounded to the nearest integer. Human
performance on such data is ~0.7.
- SUP2data: A mix of numerical variables that are continuous or
discrete, categorical variables, and binary variables. All numerical
variables (not binary and categorical variables) are normalized like
SUP1data. The quantization is balanced across the 4 classes. We intend
to compose the final test data on which the participants will be
evaluated with data generated similarly as SUP2data and real data
examples.

Hence, the final test data will be different from the original
training data and validation data with respect to normalization and
quantization of variables. Algorithms that are invariant with respect
to shift and scale of variables and not exploit information related to
variable quantization should perform similarly on the original
training and validation data and the supplementary data.

The new training data may be downloaded from http://www.causality.inf.ethz.ch/CEdata/
or https://www.kaggle.com/c/cause-effect-pairs/data. We remind the
participants that they are allowed to train their system or other data
than those provided by the organizers.
Reply all
Reply to author
Forward
0 new messages