QUESTION: Is A/B testing between two benign conditions without participants' knowledge ok?

310 views
Skip to first unread message

Jenessa Peterson

unread,
Jun 23, 2021, 11:44:57 AMJun 23
to Learning Engineering

Hi all,

Someone brought this up in yesterday’s Learning@Scale workshop on educational A/B testing (a great workshop, by the way), and I’ve been thinking about it ever since! 

In 2018, an article came out in The Washington Post with the headline, “Pearson conducts experiment on thousands of college students without their knowledge”. The article explains how Pearson did what sounds to be a pretty simple but large-scale A/B test on users of their learning software. Some users were shown “growth mindset” encouragement messages during quizzes, while others interacted with the software in its standard form without the messages. 

Here are some quotes from the article to give you a feel for the overall tone. It seems that this article’s author wanted to frame this as negative that Pearson was doing this kind of experimenting on its users without their knowledge.

“Pearson, the largest education company in the world, conducted a “social-psychological” experiment on thousands of college students in the United States — without asking for permission — by adding language into some of its software programs and then tracking how much the messages affected problem-solving.”

“Student privacy advocates have long been concerned with education publishing companies using students as “guinea pigs.” 

But why the negativity / concern?

Even if they didn’t ask for permission, neither of these conditions, I would think, would be harmful to students. And interestingly, most people, if asked, would probably say that either condition alone would be ok to offer students. Most people would probably say it’s fine for Pearson to offer their software to all of their users with the encouragement messages and that it’s also fine for Pearson to offer their software to all of their users in its usual form without messages. If both treatments are acceptable alone, why then would it be unacceptable to conduct an experiment, even without participants’ knowledge, to see which treatment leads to better learning outcomes? 

In the conversation about this at the workshop yesterday, Steve Ritter shared this interesting related study, “Objecting to experiments that compare two unobjectionable policies or treatments”. In this study, the researchers “...randomly assigned participants to rate the appropriateness of a fictitious agent’s decision to implement one policy (the A condition), implement another policy (the B condition), or conduct a randomized experiment comparing A and B (the A/B condition).”  What they found was surprising! Even when participants considered both the A condition and the B condition appropriate alone, they often disapproved of A/B testing these conditions!  What is happening here? 

Here’s what I wonder:

  • Have you run into issues with something like this in your work? What do you do about it?

  • Why, do you think, are we generally ok with social media platforms A/B testing absolutely everything and at the same time seemingly much less ok with A/B testing (between two benign, acceptable conditions) within structured learning? 

  • Should we do anything to try to change this perception of A/B testing where we encounter it? And if so, how?

  • How and where do we draw the line between what's acceptable and what's not when A/B testing without participants' knowledge? Are there criteria we can point to on this? What do your IRBs say?

Thank you for sharing your thoughts on this!

Yours,

Jenessa Peterson

Director of Learning Engineering at The Learning Agency

jen...@the-learning-agency.com


Shivang Gupta

unread,
Jun 23, 2021, 12:15:11 PMJun 23
to Jenessa Peterson, Learning Engineering
Hi Jenessa,

This question actually came up quite frequently during class discussions in my Masters in Education Technology program at CMU. One proposed scenario for discussion was as follows: Let's say an A/B test on something 'benign' like wording of lecture notes is tested out in two groups. The experiment shows that the wording used for group A results in better learning than that in group B. This means that those who were in group A were essentially given better instruction than those in group B. If both groups had received a single intervention, this would not be an issue, however, since there were two interventions (one of which was more successful than the other), it creates an equity issue. 

From what I recall from my lessons, there are a number of experimental design patterns which can be used to mitigate these issues. One that stands out is a 'Now and Later' design where the group that was used as control in the initial test is later given the same intervention as the treatment group, thus ensuring that everyone receives the superior intervention. The image below can help explain this concept and I would be happy to hear your thoughts on this as well as what others think about your original question. 

WJs3XyICUgzUn8y4f7KMpNsa.png

Best,
Shiv

photo
Shivang Gupta (he/him/his)
Product ManagerPersonalized Learning²

412-708-5956  | shiv...@andrew.cmu.edu

http://personalizedlearning2.org



--
You received this message because you are subscribed to the Google Groups "Learning Engineering" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-enginee...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/learning-engineering/b40f5b35-17ef-4be0-9647-b6a6b688ff9fn%40googlegroups.com.

Collin Lynch

unread,
Jun 23, 2021, 12:20:29 PMJun 23
to Jenessa Peterson, Learning Engineering
Jenessa, How do you know it is benign until you test it?

And once you test it, what do you do if it was *not* benign? Do you
then inform people you lied to them? Or do you leave them to deal with
the consequences?

I would argue that if there is any research study worth its salt
should be addressing open questions and if the impact is open then you
do not know if it is benign or not. That being the case the ethical
thing to do is to inform the people in advance unless deception is
somehow absolutely necessary for the work.

Collin.
> --
> You received this message because you are subscribed to the Google Groups "Learning Engineering" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to learning-enginee...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/learning-engineering/b40f5b35-17ef-4be0-9647-b6a6b688ff9fn%40googlegroups.com.



--
ArgLab & Center for Educational Informatics
Department of Computer Science
North Carolina State University

https://research.csc.ncsu.edu/arglab/people/cflynch.html

Janet Kolodner

unread,
Jun 23, 2021, 12:30:24 PMJun 23
to Collin Lynch, Jenessa Peterson, Learning Engineering
I think that’s the point, Collin, that you plan for now and later so everyone gets it the way that works better.

Janet
> To view this discussion on the web visit https://groups.google.com/d/msgid/learning-engineering/CAE%3D6FXadOKWUZBSKYDLz838zeDaevoB5h_STGDLa3zHNRTxefg%40mail.gmail.com.

Collin Lynch

unread,
Jun 23, 2021, 12:45:48 PMJun 23
to Janet Kolodner, Jenessa Peterson, Learning Engineering
Oh I agree, just to elaborate on my points and respond to Jenessa's questions:

* Have you run into issues with something like this in your work? What
do you do about it?

I have not conducted experiments where we lied to students but I have
seen experiments where seemingly insignificant changes (e.g. Min Chi's
work on low level tutoring decisions) did yield long-term outcomes.


* Why, do you think, are we generally ok with social media platforms
A/B testing absolutely everything and at the same time seemingly much
less ok with A/B testing (between two benign, acceptable conditions)
within structured learning?

Speaking personally I am not ok with it. I never want to be
experimented on without my knowledge, but since my opinion of social
media companies is quite low I don't get too worked up over it. More
importantly, in structured learning environments the goal is to teach
everyone. A/B approaches, particularly those based upon deception, are
experimental by nature and that means that you are subjecting one
group to a different treatment than others. Unless you have a
structured way to compensate for the possible loss then you are
putting some people at a disadvantage and potentially getting in the
way of the overall goal, learning for everyone. Even if you give them
training later you have still burned time.


* Should we do anything to try to change this perception of A/B
testing where we encounter it? And if so, how?

To my mind answering this depends on the goal. If you want to change
the perception to enable more A/B testing then you have to do so by
showing that you are delivering benefits for *everyone* and that noone
is being left behind. That is a study design problem.

Collin.

Jeff Dieffenbach

unread,
Jun 23, 2021, 12:48:06 PMJun 23
to Collin Lynch, Janet Kolodner, Jenessa Peterson, Learning Engineering
I don't know if Pearson has an Institutional Review Board, but universities do. The IRB acts on behalf of the study participant, but will allow "subterfuge" within reason.

There's of course a continuum, but I suspect that most A/B tests that an education company would run would be benign. Yes, if the difference between A and B is significant, there could be an educational harm, but that harm is likely (although not guaranteed to be) small and temporary.

The studies run in our lab approach this question along the lines of what Shivang shared. We'll do A then B for one group and B then A for the other. In one recent experiment, A was a literacy intervention and B was a mindfulness intervention. In others, B was a math or computer science intervention.

Best,
Jeff Dieffenbach
MIT Integrated Learning Initiative

Christopher Brooks

unread,
Jun 23, 2021, 3:59:10 PMJun 23
to Jenessa Peterson, Learning Engineering
Hi Jenessa,

Here’s what I wonder:

  • Have you run into issues with something like this in your work? What do you do about it?

I'm not sure which issue, but I have regularly had discussions with IRB about how we inform students we are engaging in experimentation. Frankly, Pearson attracted lightning for their comments but every web-scale vendor is doing this, in or out of ed tech. There are questions of (a) law and (b) ethics which I think need to be separated for discussion. And both of these vary by jurisdiction and community. You say benign as a treatment, but I think minimal risk might be another phrase to look at, as it aligns well with how the common rule frames much research. For instance, minimal risk studies can have a waiver of informed consent from the IRB (see 46.116 (f)). I have requested this in the past, when I perceived that informed consent would change subject response rate and bias the data, for instance.
  • Why, do you think, are we generally ok with social media platforms A/B testing absolutely everything and at the same time seemingly much less ok with A/B testing (between two benign, acceptable conditions) within structured learning? 

I think it depends upon who "we" are in your sentence about. I think there are a lot of people who are ok with RCTs in educational platforms without informed consent. In my experience the sensitivity comes up when (a) there is no oversight or perception of oversight of the research, and (b) with individuals who don't see the value in experimentation.

For (a) I think this is one of the places Pearson attracted heat which others might not, in part because they are a commercial organization and may or may not have contractual obligations to those they provide services too. I don't know enough about how they work to ensure oversight of the research, but I would expect that if someone asked me this at a conference the very next question would be if IRB approved or exempted the study. With (b) I'll be more provocative and suggest that there are great groups of researchers and educators who don't value RCTs as a method in field studies. 
  • Should we do anything to try to change this perception of A/B testing where we encounter it? And if so, how?

Yes. I think there are multiple approaches depending on the stakeholder who has that perception. I'll be honest, an archived public mailing list won't generate the most authentic and vulnerable discussion on this, but let me pitch two of my particular views on the issue:

1. We should recognize, communicate, and contextualize an obligation to participate in studies and experiments as a social responsibility of learners. There is such great literature on this in the field of medical studies -- both for and against the question of obligation -- and this should be a template for us to understand the issues within the field of education. A google of "obligation medical research" brings forth such depth and breadth of perspective.

2. We should give learners access to data. Make it commonly known that the data is collected and used. This is part of the *informed* part of informed consent. This gives them at a minimum some agency to ask questions and be involved in the research (and their learning!) in new ways.

  • How and where do we draw the line between what's acceptable and what's not when A/B testing without participants' knowledge? Are there criteria we can point to on this? What do your IRBs say?

My IRB wants to know (non-exclusive list):
a) Is it minimal risk?
b) What is the risk to the study if you collect informed consent?
c) What is the payoff to the participant and/or society at large?

Colin mentioned deception but I'm not sure where that fits into the conversation -- A/B testing without getting informed consent is not deception by itself. Deception is different, and my IRB has specific questions with respect to deception. I've done one deception study - again, minimal risk, and the deception was necessary to ensure participant response bias. As part of this work the IRB asked us to inform the learners about the study.

Regards,

Chris
--
Christopher Brooks
Assistant Professor, School of Information

E-Mail: broo...@umich.edu
Web: http://christopherbrooks.ca

School of Information
University of Michigan
4439 North Quad
105 S. State St.
Ann Arbor, MI 48109-1285

John Whitmer

unread,
Jun 23, 2021, 4:12:28 PMJun 23
to Christopher Brooks, Jenessa Peterson, Learning Engineering

+1 to Chris’s response, calling out a few points that I strongly agree with:

  1. We should consult/consider the literature around participating in medical studies, which have very similar dynamics and interactions
  2. The concern around potential respondent bias if we are to require notification for all studies
  3. The reality that this is being conducted already by most web-scale companies under the mantra of usability and “analytics”, with in-depth participatory research requiring IRB and informed assent/consent.

 

I spoke with a Google engineer once who was appalled that we don’t conduct A/B testing for educational technology feature development – his response was “how do you know if anything works if you don’t test it”?  He was a bristly fellow, but that response has always stuck with me.

 

Best, John

--

You received this message because you are subscribed to the Google Groups "Learning Engineering" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-enginee...@googlegroups.com.

Peter Bergman

unread,
Jun 23, 2021, 4:18:11 PMJun 23
to John Whitmer, Christopher Brooks, Jenessa Peterson, Learning Engineering
Just another framing, but one can be doing significant harm by *not* running an experiment as well. Put another way I once heard: you’re always running an experiment; sometimes it’s designed in a way that you can readily learn something from it. 

Collin Lynch

unread,
Jun 23, 2021, 4:37:01 PMJun 23
to Christopher Brooks, Jenessa Peterson, Learning Engineering
> Colin mentioned deception but I'm not sure where that fits into the conversation -- A/B testing without getting informed consent is not deception by itself.

Chris I think this is an area that varies from IRB to IRB, some past
IRB officers that I have worked with have viewed failure to disclose
an AB as a form of deception even if it is considered to be minimal
risk and signed consent is waived.

Best,
Collin.


>
> Regards,
>
> Chris
> --
> Christopher Brooks
> Assistant Professor, School of Information
>
> E-Mail: broo...@umich.edu
> Web: http://christopherbrooks.ca
> Lab: http://edtech.labs.si.umich.edu
>
> School of Information
> University of Michigan
> 4439 North Quad
> 105 S. State St.
> Ann Arbor, MI 48109-1285
>
> --
> You received this message because you are subscribed to the Google Groups "Learning Engineering" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to learning-enginee...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/learning-engineering/CAJQ6OJMAg8AwC8MK%3D%3DEo_DuL1h7kBKFcDK22Dd2qEWYQwYjKBg%40mail.gmail.com.



--

Collin Lynch

unread,
Jun 23, 2021, 4:56:28 PMJun 23
to Peter Bergman, John Whitmer, Christopher Brooks, Jenessa Peterson, Learning Engineering
Peter, while there is merit in focusing on how we design it, I would
argue that there is a difference between whether we are learning from
experience, and what methods we use. In my opinion education should
never be static and all educators should be learning from experience
if not experiments. My retort to John's Google Engineer would be that
we are learning, just by different means.

Personally I see A/B testing as one very useful method but I think it
does raise important ethical and legal issues surrounding risk to
students, fairness, and other aspects. The fact is in education,
particularly compulsory education, students are in a very different
context than users of facebook or Google. We have to keep that in
mind when designing and evaluating our studies.

Best,
Collin.
> To view this discussion on the web visit https://groups.google.com/d/msgid/learning-engineering/CANqX2iK%2BHZ%2BVWfoz4jme-bjBgF2nGxEeU15-srCwL72LP1dmRA%40mail.gmail.com.

Shivang Gupta

unread,
Jun 23, 2021, 5:06:12 PMJun 23
to Collin Lynch, Peter Bergman, John Whitmer, Christopher Brooks, Jenessa Peterson, Learning Engineering
I often heard the same quote as John about not knowing what works without testing it at my previous job. Many of the things they wanted to 'a/b test were actually questions that had been tested and answered by other researchers so one valid answer in my opinion would be to check if others have tested the idea before and relying on prior research and knowledge to know what works best. This may not always work for innovative new designs, but many ideas are not necessarily entirely new enough to be worth a/b testing.

Best,
Shiv

Ritter, Steve

unread,
Jun 23, 2021, 6:19:32 PMJun 23
to Learning Engineering

Well, this is a great discussion! You all should go to the Educational A/B testing workshop next year:)

 

Since I inadvertently kicked this off, I’ll put in my 2 cents. As an educational software publisher, we make changes all the time. Some of the changes are at the request of a customer (and so might be based on a teacher’s experience), others are made internally based on a desire to introduce new features, improve outcomes or address technical issues. We don’t intend any of these changes to be harmful but, as Colin points out, any of them could be. The point of John and Peter’s quotes is that our choice is really between knowing that they’re harmful (or beneficial!) or not knowing. A/B tests at least give us the possibility to know (and correct our errors, if we need to).

 

We’re never going to A/B test everything, but I think we have an obligation to, as much as possible, know whether we’re moving in the right direction.

 

To me, the interesting thing about the “objecting to experiment” paper that Jenessa referenced is that it shows how most people frame the nature of expertise. The experimentation mindset requires the humility to say “I’m an expert. I know what experts in my field know. I also know how to learn some things that I don’t know (and other experts don’t know).” But I think many people think of experts as people who know everything; wanting to run an experiment is a sign of weakness.

 


Steve Ritter, PhD

Founder and Chief Scientist



sri...@carnegielearning.com
F: (412) 690-2444
www.carnegielearning.com
www.scilearn.com
www.zorbitsmath.com

Scientific Learning and Zorbit's Math are part of Carnegie Learning

From: learning-e...@googlegroups.com <learning-e...@googlegroups.com> on behalf of Shivang Gupta <shiv...@andrew.cmu.edu>
Date: Wednesday, June 23, 2021 at 5:06 PM
To: Collin Lynch <cfl...@ncsu.edu>
Cc: Peter Bergman <ber...@tc.columbia.edu>, John Whitmer <jcwh...@gmail.com>, Christopher Brooks <broo...@umich.edu>, Jenessa Peterson <jen...@the-learning-agency.com>, Learning Engineering <learning-e...@googlegroups.com>
Subject: Re: QUESTION: Is A/B testing between two benign conditions without participants' knowledge ok?

*External Email*


David Porcaro

unread,
Jun 24, 2021, 12:51:29 PMJun 24
to Learning Engineering
Sorry, slow to the conversation (swamped with work recently so getting behind on this forum).  I was actually directly involved in this study at Pearson so I can add some context that's coming up in the conversation.  1) we went through extensive legal review on this one before hand (the irony was we started this all with the clear statement that we didn't want to become a news story like the then recent Facebook experiment...and we know what great paving stones good intentions make). Also we were working with an organization (Behavioral Insights Team) who had done extensive behavioral nudges and experiments, including many with the UK government (such as testing what kinds of mailings contribute to greater voter turnout).  If I remember correctly (it's been a long time), we concluded alerting students of the intervention would have impacted the outcomes (some in this forum may disagree with that, which is a healthy point of view). We worked with some well known and well respected behavioral psychologist researchers on the study design and the wording of the interventions (in other words, this wasn't just a couple of engineers trying things in a corner). Legal review concluded we didn't need to go through IRB for this study because it was using normal product usage and the research was for product improvement and was well within the conditions of the current EULA.  This is really important because this is going to be interpreted differently in just about every IRB.  What is considered product improvement? What is considered generalized knowledge? Do you need IRB every time you present something at a conference (this experiment became a problem when the results were presented at AERA). The guidance on this isn't always clear, so I encourage anyone working on A/B testing of content to be clear with the universities or school districts you work with and their IRB process. While the results of the study were not as impactful as everyone had hoped (echoing much of the recent research showing how context matters in applying growth mindset messages in education settings), all included in this study (including Kristen DiCerbo, Khan Academy's CLO who I believe is in this group), learned a lot about where people are and aren't comfortable with in educational A/B testing.

I share this context because I have talked with many organizations who have been doing A/B testing in education, and have seen them struggle with many of these design decisions.  It's useful to see what one organization did (and what the consequences were).  I could imagine doing the exact same things in another organization (that isn't a lightening rod) and getting a much more welcome response.  I could also see organizations taking other steps around IRB, informed consent, open data, etc. that may have different outcomes (both on the experiment itself and on the trust built with users and the community). 

I'll just point out one other issue I have seen a lot recently around A/B testing of content.  It feels that most students, parents, teachers and administrators are ok with A/B testing of content on supplemental course content (homework, non-graded content, electives, informal learning, etc.).  But people have pretty big concerns when you experiment on graded or testable material (are all students in my class or my school getting an equal experience). A/B: B/A testing (giving both conditions both versions in different sequences) may help.  But as with all experiment designs, it's not perfect.

As many in this thread are concluding, the logic of A/B testing  in education is upside down, because if we do a structured experiment and learn from that and improve the learning experience and share that learning with others, people get really scared.  But, if you run an unstructured experiment (in other words a new feature launch, or a tweak to content, or an impromptu instructional change by a teacher), and don't learn from it, and assume it's benign or better for students than than status quo, even if it is indeed causing harm, we're all comfortable with that. (Go figure!)  This mindset has been the biggest barrier I've seen to the adoption of learning engineering widespread (and not technical or procedural issues). I would love to see more companies and organizations being more open in their research, but some education organizations are still very scared to do this. I'm grateful for groups like Carnegie Learning and Assistments who are helping normalize this kind of thinking!

David

Ken Koedinger

unread,
Jun 24, 2021, 2:47:55 PMJun 24
to David Porcaro, Learning Engineering
Thank you, David.  Super helpful to hear more details of the history.  I
want to particularly highlight and give a double thumbs up to this part
of your post:

> As many in this thread are concluding, the logic of A/B testing  in
> education is upside down, because if we do a structured experiment and
> learn from that and improve the learning experience and share that
> learning with others, people get really scared. /But, /if you run an
> unstructured experiment (in other words a new feature launch, or a
> tweak to content, or an impromptu instructional change by a teacher),
> and /don't /learn from it, and assume it's benign or better for
> students than than status quo, even if it is indeed causing harm,
> we're all comfortable with that. (Go figure!)
Cheers,

Ken Koedinger

On 6/24/21 12:51 PM, David Porcaro wrote:
> Sorry, slow to the conversation (swamped with work recently so getting
> behind on this forum).  I was actually directly involved in this study
> at Pearson so I can add some context that's coming up in the
> conversation.  1) we went through /extensive/ legal review on this one
> before hand (the irony was we started this all with the clear
> statement that we didn't want to become a news story like the then
> recent Facebook experiment...and we know what great paving stones good
> intentions make). Also we were working with an organization
> (Behavioral Insights Team) who had done /extensive /behavioral nudges
> and experiments, including many with the UK government (such as
> testing what kinds of mailings contribute to greater voter turnout). 
> If I remember correctly (it's been a long time), we concluded alerting
> students of the intervention would have impacted the outcomes (some in
> this forum may disagree with that, which is a healthy point of view).
> We worked with some well known and well respected behavioral
> psychologist researchers on the study design and the wording of the
> interventions (in other words, this wasn't just a couple of engineers
> trying things in a corner). Legal review concluded we /didn't/ need to
> go through IRB for this study because it was using normal product
> usage and the research was for product improvement and was well within
> the conditions of the current EULA.  This is really important because
> this is going to be interpreted differently in just about every IRB. 
> What is considered product improvement? What is considered generalized
> knowledge? Do you need IRB every time you present something at a
> conference (this experiment became a problem when the results were
> presented at AERA). The guidance on this isn't always clear, so I
> encourage anyone working on A/B testing of content to be clear with
> the universities or school districts you work with and their IRB
> process. While the results of the study were not as impactful as
> everyone had hoped (echoing much of the recent research showing how
> context matters in applying growth mindset messages in education
> settings), all included in this study (including Kristen DiCerbo, Khan
> Academy's CLO who I believe is in this group), learned a lot about
> where people are and aren't comfortable with in educational A/B testing.
>
> I share this context because I have talked with many organizations who
> have been doing A/B testing in education, and have seen them struggle
> with many of these design decisions. It's useful to see what
> /one/ organization did (and what the consequences were).  I could
> imagine doing the exact same things in another organization (that
> isn't a lightening rod) and getting a much more welcome response.  I
> could also see organizations taking other steps around IRB, informed
> consent, open data, etc. that may have different outcomes (both on the
> experiment itself and on the trust built with users and the community).
>
> I'll just point out one other issue I have seen a lot recently around
> A/B testing of content.  It feels that most students, parents,
> teachers and administrators are ok with A/B testing of content on
> supplemental course content (homework, non-graded content, electives,
> informal learning, etc.).  But people have pretty big concerns when
> you experiment on graded or testable material (are all students in my
> class or my school getting an equal experience). A/B: B/A testing
> (giving both conditions both versions in different sequences) may
> help.  But as with all experiment designs, it's not perfect.
>
> As many in this thread are concluding, the logic of A/B testing  in
> education is upside down, because if we do a structured experiment and
> learn from that and improve the learning experience and share that
> learning with others, people get really scared. /But, /if you run an
> unstructured experiment (in other words a new feature launch, or a
> tweak to content, or an impromptu instructional change by a teacher),
> and /don't /learn from it, and assume it's benign or better for
> *Steve Ritter, PhD*
> **Founder and Chief Scientist
> *
> *
>
> sri...@carnegielearning.com
> F: (412) 690-2444 <tel:(412)%20690-2444>
> www.carnegielearning.com <http://www.carnegielearning.com/>
> www.scilearn.com <http://www.scilearn.com>
> www.zorbitsmath.com <http://www.zorbitsmath.com>
>
> *Scientific Learning and Zorbit's Math are part of Carnegie Learning*
>
> *From: *learning-e...@googlegroups.com
> <learning-e...@googlegroups.com> on behalf of Shivang Gupta
> <shiv...@andrew.cmu.edu>
> *Date: *Wednesday, June 23, 2021 at 5:06 PM
> *To: *Collin Lynch <cfl...@ncsu.edu>
> *Cc: *Peter Bergman <ber...@tc.columbia.edu>, John Whitmer
> <jcwh...@gmail.com>, Christopher Brooks <broo...@umich.edu>,
> Jenessa Peterson <jen...@the-learning-agency.com>, Learning
> Engineering <learning-e...@googlegroups.com>
> *Subject: *Re: QUESTION: Is A/B testing between two benign
> conditions without participants' knowledge ok?
>
> **External Email**
>
> ------------------------------------------------------------------------
> >> Web: http://christopherbrooks.ca <http://christopherbrooks.ca>
> >>
> >> Lab: http://edtech.labs.si.umich.edu
> <http://edtech.labs.si.umich.edu>
> >>
> >>
> >> School of Information
> >> University of Michigan
> >> 4439 North Quad
> >> 105 S. State St.
> >> Ann Arbor, MI 48109-1285
> >>
> >> --
> >> You received this message because you are subscribed to the
> Google Groups "Learning Engineering" group.
> >> To unsubscribe from this group and stop receiving emails
> from it, send an email to learning-enginee...@googlegroups.com.
> >> To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/CAJQ6OJMAg8AwC8MK%3D%3DEo_DuL1h7kBKFcDK22Dd2qEWYQwYjKBg%40mail.gmail.com
> <https://groups.google.com/d/msgid/learning-engineering/CAJQ6OJMAg8AwC8MK%3D%3DEo_DuL1h7kBKFcDK22Dd2qEWYQwYjKBg%40mail.gmail.com>.
> >>
> >> --
> >> You received this message because you are subscribed to the
> Google Groups "Learning Engineering" group.
> >> To unsubscribe from this group and stop receiving emails
> from it, send an email to learning-enginee...@googlegroups.com.
> >> To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/SJ0PR04MB774344C0265224B8E2285A7BFF089%40SJ0PR04MB7743.namprd04.prod.outlook.com
> <https://groups.google.com/d/msgid/learning-engineering/SJ0PR04MB774344C0265224B8E2285A7BFF089%40SJ0PR04MB7743.namprd04.prod.outlook.com>.
> >
> > --
> > You received this message because you are subscribed to the
> Google Groups "Learning Engineering" group.
> > To unsubscribe from this group and stop receiving emails
> from it, send an email to learning-enginee...@googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/CANqX2iK%2BHZ%2BVWfoz4jme-bjBgF2nGxEeU15-srCwL72LP1dmRA%40mail.gmail.com
> <https://groups.google.com/d/msgid/learning-engineering/CANqX2iK%2BHZ%2BVWfoz4jme-bjBgF2nGxEeU15-srCwL72LP1dmRA%40mail.gmail.com>.
>
>
>
> --
> ArgLab & Center for Educational Informatics
> Department of Computer Science
> North Carolina State University
>
> https://research.csc.ncsu.edu/arglab/people/cflynch.html
> <https://research.csc.ncsu.edu/arglab/people/cflynch.html>
>
> --
> You received this message because you are subscribed to the
> Google Groups "Learning Engineering" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to learning-enginee...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/CAE%3D6FXY2O5EehZ1T7280%2BSsTzhJfZQ5hsX17fXs2jVGahGxB2A%40mail.gmail.com
> <https://groups.google.com/d/msgid/learning-engineering/CAE%3D6FXY2O5EehZ1T7280%2BSsTzhJfZQ5hsX17fXs2jVGahGxB2A%40mail.gmail.com>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Learning Engineering" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to learning-enginee...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/CA%2BCRx0qjZ56rGi2-JhvP2so63PNNhVAwkR7c7FamZmiy5HSxDQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/learning-engineering/CA%2BCRx0qjZ56rGi2-JhvP2so63PNNhVAwkR7c7FamZmiy5HSxDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Learning Engineering" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to learning-enginee...@googlegroups.com
> <mailto:learning-enginee...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/learning-engineering/58820c43-4752-4e05-bb0a-664c38902063n%40googlegroups.com
> <https://groups.google.com/d/msgid/learning-engineering/58820c43-4752-4e05-bb0a-664c38902063n%40googlegroups.com?utm_medium=email&utm_source=footer>.

John Whitmer

unread,
Jun 24, 2021, 4:12:55 PMJun 24
to David Porcaro, learning-e...@googlegroups.com

Fascinating David; thank you for sharing that context and history. 

 

It is indeed very illustrative aligns with my experience (although perhaps I’ve got confirmation bias going); when I led R&D at Blackboard, we did a lot of research on features – without A/B testing, and to be honest, while we always operated clearly within our EULA, I had more than a few sleepless nights worrying about potential blowback for doing research on users. We never did A/B tests, always post-hoc or simulation experiments, and only once got a semi-threatening message from a student newspaper reporter (who changed their mind after they learned about the study and aims to improve the application).

 

The biggest irony to me is that the orgs that get this blowback headline treatment are those that go public with their results so that other researchers (among others) can learn from their research.  Maybe someday ESSA & best practices will require evidence, but for now it feels like the honest players are those who get in hot water.  Stay quiet and keep your findings confidential/proprietary and you won’t get in the headlines.  This was the same for the Facebook story, which was a result of a conference presentation also.

 

I vote for the next workshop theme to be “Turning Educational A/B Testing Rightside Up”

Kristen DiCerbo

unread,
Jun 25, 2021, 11:56:17 AMJun 25
to Learning Engineering
Hi all,

Thanks David for providing the context here, your memory is spot on with how things happened. Yes, that's me who had the fun of doing all the media interviews surrounding this.

I continue to be fascinated by this article in which participants "approve of untested polices or treatments (A or B) being universally implemented but disapprove of randomized experiments (A/B tests) to determine which of those policies or treatments is superior." I think there is a major issue here that is even deeper than issues of IRB, etc.; the general public needs to be convinced of the value and logic of experimentation.

Kristen

Kripa Sundar

unread,
Jun 25, 2021, 2:03:51 PMJun 25
to Kristen DiCerbo, Learning Engineering
Yet another fascinating conversation. Thanks for sharing the article Kristen. This hesitation is also prevalent in schools and among teachers which lands us into catch-22 situation of sorts.  Administrators and teachers want to know that the product will "work" but are often concerned with the between-group study - even in instances where the comparison condition receives some activity for ethical reasons - similar to what's being discussed here more generally. If the reason indicated for the between-group study is so that we can evaluate the effectiveness of said treatment, then the teacher buy-in drops with the argument: why test something that might not work when I already have something that works?  The stakes are higher when school boards ask to justify ROI on edtech purchases expecting causality without allowing for the conditions necessary to establishing such causality. With the influx of funding (juxtaposed with teacher attrition and burn-out) and post-2020-21 school year experiences, I am processing what this means for edtech pilots in school settings. 

My go-to in applied settings has been to use mixed methods which can alleviate some of these concerns (and get better feedback for schools and edtech firms alike). Are there other possibilities that can address this conundrum? Any guidance on the boundary conditions for practical rigor?

Kripa

Jenessa Peterson

unread,
Jun 25, 2021, 3:06:53 PMJun 25
to Kripa Sundar, Kristen DiCerbo, Learning Engineering
This conversation is so interesting! Thank you for your thoughts about this, all of you! And how enlightening to hear from people who were involved in the original Pearson study, Kristen and David, as well!

Reading all of your comments, I’m starting to wonder if creating a set of shared protocols around this would help address this “catch-22”, as Kripa calls it. I’m talking about something like the Student Privacy Pledge or the Ethical Framework for AI in Education.

Does something like this already exist? It sounds to me, from your comments, that currently each IRB has or creates its own standards each time they evaluate a proposed study at each separate institution.

I wonder what you all think about the idea to create a shared set of protocols. The reasoning for this might be two-fold:

  1. A guide to mitigate as much harm as possible: A set of shared protocols could act as a sort of checklist for any institution doing A/B testing, a list of factors or ideas to consider when designing studies to best mitigate potential harm. (“Now and Later” design where possible as Shivang and Jeff describe, perhaps random assignment on the level of schools or districts instead of individuals where possible, etc., as well as clear criteria to define “minimal risk”)

  2. Some protection from media blowback to encourage sharing: As John and Christopher point out, many institutions *are* doing A/B testing with and without explicit user knowledge, and it’s only those who think to share their findings, like Pearson did, who potentially find themselves in the hot seat to explain their methods and reasoning. If we had a set of industry standard protocols about this, organizations wishing to make their findings public could point to this to help deflect *some* of the potential media blowback, and this partial protection might make it easier and more likely for institutions to share their findings. (But I also wonder how often this type of media blowback happens. How much does this kind of concern prevent your institution from experimenting or sharing results?)

I’m really interested to hear your thoughts on this!

Thanks,
Jenessa

piotr

unread,
Jun 27, 2021, 9:24:12 PMJun 27
to Learning Engineering
I apologize up-front for the long, rambly email, but....

A) There is a distinction between:

1. What is considered ethical to us (grounded in our cultures)?
2. What is considered ethical in the diverse cultures of our student body? 
2. What set of checks-and-balances which would guarantee that things stay ethical?

It's not hard for me to design ethical AB experiments where I believe there is no potential for subject harm. What's harder is designing experiments where subjects from diverse cultures agree there is no potential harm. What's even harder is designing guidelines and mechanisms which unethical researchers won't game to do unethical things. And it's even harder to design policies which research subjects will trust and believe.

B) I do think it's helpful to point back to the Nuremberg Code here, which laid out agreed-upon international norms three quarters of a century ago. It starts:

1. The voluntary consent of the human subject is absolutely essential.

This means that the person involved should have legal capacity to give consent; should be so situated as to be able to exercise free power of choice, without the intervention of any element of force, fraud, deceit, duress, over-reaching, or other ulterior form of constraint or coercion; and should have sufficient knowledge and comprehension of the elements of the subject matter involved as to enable him to make an understanding and enlightened decision. This latter element requires that before the acceptance of an affirmative decision by the experimental subject there should be made known to him the nature, duration, and purpose of the experiment; the method and means by which it is to be conducted; all inconveniences and hazards reasonably to be expected; and the effects upon his health or person which may possibly come from his participation in the experiment.

The duty and responsibility for ascertaining the quality of the consent rests upon each individual who initiates, directs or engages in the experiment. It is a personal duty and responsibility which may not be delegated to another with impunity.

Perhaps it's okay to ignore this -- ethical norms evolve --- but if you do, it's helpful to articulate why. It's worth noting that most IRBs do not follow the Nuremberg Code (it's not a legally-binding document), and a lot of subjects are involved in research they don't know about.

I'm not quite sure why. It's easy enough to have platform-wide consent when users register and opt-outs on a settings page. (Disclaimer: All of the platforms I built to-date informed participants of research, but none had real means to opt out; in retrospect, that was a mistake).

C) Getting back to checks-and-balances:
  1. Most university IRB policies are full of loopholes. University faculty can and regularly do engage in research outside of IRB review. For example, at many universities, faculty can do research in "non-work" time (as if professing were a 9-5 job). I've seen this loophole used to conduct unethical studies by faculty at schools like Harvard, Albany, Berkeley, Syracuse, Georgia, and many others. Ignoring ethics accelerates research, publications, and careers. When we did edX, MIT had a clever corporate structure which made edX fall outside of IRB purview as well.
  2. IRB applications are just pieces of paper. In many cases, I see PIs promising things which they don't have technical means to implement. They count on participants not making requests, and no one really checking if what was promised is what was asked. Same thing is true for contracts as well; SPDC/A4L is a paper tiger. Companies promise all sorts of things in school contracts, and then promptly ignore them. In both cases, people rarely get caught, and when they do, the punishment is a slap on the wrist.
D) Regarding media blowback, I think most researchers won't care. The odds of blowback on any particular piece of research is minimal. In the intense competition for tenure slots, tenure, grants, high-impact papers, that's not a risk rationally worth considering. When the harm is to the community, but the benefit is to the individual, you run into a tragedy of the commons.

In conclusion, a lot of this stuff would feels pretty hollow without considering implementation. There's some analogue to the research-practice gap here. Researchers are incentivized to write papers. They're not incentivized to care about things like ethics or research integrity. Over time, behaviors and cultures converge to incentive structures. The key question is about how we set meaningful incentives which won't land the Learning Engineering community in hot water or harm students. Good incentive structures rarely directly mirror common sense policy. Policies need to be simple, understandable, loopholes-free enforceable, etc., all of which places complex constraints beyond the underlying ethical questions.

Best,
Piotr

Motz, Benjamin Alan

unread,
Jun 30, 2021, 7:08:55 AMJun 30
to Learning Engineering
Howdy y'all.

Normally I'd have replied within minutes of Jenessa's original email, but I've been out on a family road trip.  It was great, we were biking all around Acadia, and I hardly ever opened email.  I mention this only to say: It's been fascinating reading this thread from a distance, and I think this distance has caused me to notice an elephant in the room.

Something awesome about the learning engineering community is that it blurs the line between research and practice, bringing together scholars, teachers, technologists, and entrepreneurs in the pursuit of improving education.  But scholars, teachers, technologists, and entrepreneurs clearly have different motivations.  For example, (according to Piotr:) researchers want to write academic articles, but (this is the elephant:) companies usually want to make money.  Pearson definitely wants to make money.  

So why might a national mindset experiment conducted by David Yeager be acceptable but the same basic experiment conducted by Kristen DiCerbo and David Porcaro in a Pearson product would cause massive blowback?  Maybe Yeager is perceived to be building generalizable knowledge, but Pearson is perceived to be for profit.  When a teacher modifies their class in an uncontrolled experiment, they're perceived as just trying to help.  

So our question "Is A/B testing between two benign conditions without participants' knowledge ok?" may find multiple answers, depending on the perceived purpose of the testing.  

Yes, companies are doing experimental research all the time, but come on, modifying the photo on a cereal box or the layout of a digital ad isn't the same as modifying an educational experience.  I'm also not saying that researchers always have good intentions -- some researchers are motivated by profit just the same.

But I am trying to say that perceived intentions matter (and holy crap there's a ton of psychology research to support this statement -- don't get me started).  Perhaps there'd be some value in reflecting on this, and how companies might do a better job of communicating their intentions when it comes to experimental research?  Clearly (as Collin, Piotr, Chris, and Jeff all point out:) existing principles of research ethics are helpful in these considerations, and transparency is a key virtue.

Warmly,
Ben





From: learning-e...@googlegroups.com <learning-e...@googlegroups.com> on behalf of piotr <pi...@mitros.org>
Sent: Sunday, June 27, 2021 9:24 PM
To: Learning Engineering <learning-e...@googlegroups.com>
Subject: [External] Re: QUESTION: Is A/B testing between two benign conditions without participants' knowledge ok?
 
This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
--
You received this message because you are subscribed to the Google Groups "Learning Engineering" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-enginee...@googlegroups.com.

Piotr Mitros

unread,
Jun 30, 2021, 5:50:12 PMJun 30
to Motz, Benjamin Alan, Learning Engineering
Hi Ben,

I do want to clarify one important point -- central to my email. What researchers want is irrelevant. My email was about incentive structures.

There's a delightful book -- The Dictator's Handbook -- that talks about how incentive structures constrain behavior. The gist of it was that as you move up in power structures (or any competitive system), your flexibility to do what you want actually goes down.

I'll give a non-controversial example. Most researchers would like to have their research openly readable by anyone. Most researchers need to publish in journals with large impact factors, and Elsevier has marketing $$$ to drive up impact factors. Ergo, most researchers are doing something other than what they would really want. Funding or regulatory pressure requiring open-access is liberating, rather than constraining. That's true of a lot of regulation.

Companies also don't want anything. Companies are collections of individuals, with individual drives. A typical Google employee wants a large paycheck, interesting technical work, and work-life balance. How Google does as a whole is secondary. Middle managers want to maximize headcount and internal visibility. The CEO wants a fat paycheck, a golden parachute, strong personal brand, and to be set up for a good follow-up job. And so on. Organizations act as aggregates of those. If a business loses money, it will disappear, but large businesses compete with other large businesses, all of which share the same set of incentive structure problems.

Selective pressures are also agnostic to what researchers want. Whether I:
  • make a mistake and publish an incorrect but politically-popular result;
  • happen to believe an incorrect but politically-popular result, and let my biases influence my research;
  • bake data to bias a result "playing the academic game;" or
  • fabricate data outright
The result is the same. I'll get a better job than a researcher who publishes a politically unpopular but honest result. Worse, low-quality research is faster to generate than high-integrity research.

To answer your question, there might be a lot of factors at work:
  • The mindset work sends the same message as half of the Disney movies: If you believe hard enough, you'll do okay. If you can generate data to support a belief like that, people will love you for it, and you'll be a rockstar scientist;
  • The mindset work is intermingled with social justice. If you criticize it, you're liable to get cancelled;
  • Yeager, etc. might sit on your grant, peer review, or letter-writing committees. There is no upside to saying bad things about them, and a lot of upside to, if you have criticism, shutting up;
  • Businesses moved into a post-truth era with HBS reforms in the eighties. Academia didn't make that move until recently; people still trust academia;
  • ... and so on
There's also a huge random factor to what will and will not generate a scandal.

These are systems which can be mathematically modeled, analyzed, and engineered to achieve the outcomes we want. My hope for LE was that it would give a place to do this kind of work. I can draw feedback loops for incentive structures. My concern with LE as a field is that it's left engineering in the backyard. It seems to just be ed-tech. Every piece of the educational system can and should be approached by this field with quantitative engineering rigor, not just software. And even software, to really be learning engineering, needs an engineering design approach.

I'm convinced more tech widgets, in isolation, won't achieve the change we want.

Best,
Piotr
Reply all
Reply to author
Forward
0 new messages