Peer Evaluation for OERu courses

Akash Agarwal

unread,

Mar 21, 2014, 11:14:19 AM3/21/14

to OE...@googlegroups.com

Hi,

I am a third year student at IIIT Hyderabad and plan to implement peer evaluation for OERu this summer as part of my Google Summer of Code project. It thought it would be nice to explain it briefly here so that I could get some critical feedback about how it could be implemented in the courses delivered by you or ones which you are a part of.

OERu needs a scalable way to grade students. Peer evaluation will allow the learners and other people involved in the course to grade peers. A fixed number of activities(3-5 may turn out to be a good number) of a particular kind will be assigned to each learner which he/she will have to grade. Others for example, students who have completed the course, tutors,community volunteers can choose any blog which they would want to evaluate.

For large scale courses with tens of thousands of learners it becomes impractical for moderators or instructors to grade. For such courses either machine grading or peer evaluations can be used. Peer evaluation, not only serves this purpose but also benefits learners in several other ways. Learners actively participate instead of just passive read only activities. It increases student responsibility and autonomy. Learners will have to strive for a better and more advanced understanding of subject matter, skills and processes. It involves them in critical reflection.

My thoughts on how it could be implemented and possible challenges:

Reward system

There is a credit system in place in some current WikiEducator courses where learners are given some form of credit for writing various kinds of posts.
Every learner will be assigned a fixed number of peers to evaluate. This evaluation will also have some weight-age towards the course grade and they will be given some credits for evaluating.

Challenges

Peer evaluations may cause learners to highly rate their friends and give low grades to others. They may form small groups within themselves and try to grade only among themselves. Also, some learners may have the tendency to give the same grades to others. These can be significantly reduced by randomly assigning the learners which peers would have to grade. Also, more than 1 evaluation will be used and the final scores will be the mean of individual’s to avoid giving weight to unusually high or low assessments.
Peer evaluations would strictly require deadlines to be met, both for submitting the activities (so that they are available for assessments by others) as well as for the evaluation itself.
There may be cases where learners do not review the assigned posts. The reward system described may reduce this significantly.

Basic Prototype

http://wikieducator.org/User:Akash_Agarwal/Prototypes/Activities
It is based on the activities data from the course Open_content_licensing_for_educators
For each activity, the prototype assigns every learner two posts of peers to evaluate.
Thus, each post will be evaluated by two peers.
The assignment of activities to peers is automated.

I would be very grateful to hear your thoughts on Peer Evaluation and how OERu can use it in the most effective way.

Best regards,

Akash Agarwal

IIIT Hyderabad

akas...@gmail.com | +919652722676

Wayne Mackintosh

unread,

Mar 21, 2014, 7:37:18 PM3/21/14

to OE...@googlegroups.com

Dear Akash,

Thanks for sharing your ideas for your GSoC proposal with the OERu community.

I think peer evaluation can add considerable value to the OERu delivery model, especially for those courses where authentic participation is a requirement for successful completion or awarding of open badges for achieving a designated threshold for participation (eg number of posts).

Notwithstanding the recent developments in computer grading of essay-like assessments, the technology is perhaps still not sufficiently mature for using these approaches for summative assessment for university credit - however, that's not the focus of your GSoC proposal. I think the lessons we can learn from automating peer evaluation will inform our next iteration in using these technologies.

Drawing on our OCL4Ed prototype courses where we have been trailing "certification for participation", one of the requirements is to complete designated blog activities. In the prototype, learners "registered" their blog post and there was a manual process to check that the post related to the question. This was a participation activity, so we did not attempt to grade the quality of the answer. The design of the course was such that the learners could include selected posts for final assessment which would be graded by a qualified assessor for those learners pursuing formal credit. The practical problems with this method included:

Learners not providing the correct url for the individual blog post (for example providing the url for the edit view of the blog, or the url for the blog homepage and not the post concerned.) -- It would be better to automate the identification of the specific blog post, for example requiring the learner to use a unique tag for the post.
Linking to posts which had nothing to do with the question. This is where rudimentary automation could help, for example word count, and a few keyword searches of the text. Moreover, peer evaluation would help to address these challenges.

A few reflections and practical things to think about

In OERu courses, learners may participate for self-interest, that is they do not intend to complete all the activities. So we will need to think about solutions to avoid assigning a peer-evaluation tasks to these learners. One way to do this is to only assign peer-evaluation tasks to those learners who are pursuing certification for participation. I don't think a pre-course survey is a good way to identify these learners because they may change their minds as the course progresses. Developing an algorithm which distributes peer assessment among those learners who actually post activities would be a good way to solve this challenge.
The reward system for completing a peer review is a good idea and I agree, that this will encourage participation. Initially, because this form of assessment is intended to gauge authentic participation (rather than awarding a grade for the course) we could focus on generic assessment criteria, for example: Has the learner competed the task? or Does the post meet the requirements for this activity and using a simple scale.
I think that it is important that the assessment grade and name of the assessor is transparent.
We need to think about an "appeal" mechanism where a learner can flag posts where they feel the peer-reviewers have not assessed the post reliably. An interesting thought is that peer-assessors could loose kudos points for invalid assessments if found to be gaming the system.

Akash - -this is a great start to thinking about solutions and as we progress.

Wayne

--
--
You received this message because you are subscribed to the Google
Groups "OER university" group.
To post to this group, send email to oer-uni...@googlegroups.com
To unsubscribe from this group, send email to
oer-universit...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/oer-university?hl=en?hl=en
Visit the OER univeristy page on http://wikieducator.org/OER_university

---
You received this message because you are subscribed to the Google Groups "OERu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OERu+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Wayne Mackintosh, Ph.D.
Director OER Foundation
UNESCO, COL and ICDE Chair in OER, Otago Polytechnic & OER Foundation

Skype: WGMNZ1
Twitter: Mackiwg

Brian Mulligan

unread,

Mar 24, 2014, 9:17:29 AM3/24/14

to OE...@googlegroups.com

Hi Akash.

Are you planning to write code to add to the functionality of WikiEducator? It might be good to look at the functionality of other systems such as the Workshop activity in Moodle. I find this to be quite good. It addresses some of the issues mentioned by Wayne (eg allowing you to restrict reviews only to those who have taken the trouble to submit, only grading those who grade others) and some you mentioned (no problem with friends grading each other in large courses with random allocation). Another method of reducing such cheating is to partially award a student's grade on the accuracy at grading others. If their grade differe from that of the average given by others, their own personal grade is reduced (this can also be done with their grading of themselves - ie not only should they know the "stuff" they should have a good idea of how well the know the stuff).

One thing that I'd like to see in Peer Evaluation (which I don't think Moodle can do) is some way of calibrating a student's standard of grading. I think that this can be done by putting up a sample assignment and asking them to grade that, but I have an idea for a more sophisticated algorithm (which so far I have failed to design), where instructors can grade a sample of submissions and then students who have graded those submissions are "calibrated" and their other reviews modified accordingly. Then the other students who have graded submissions graded by this first group are calibrated from them and so on until all students have been calibrated. given that inaccuracies will increase throughout this process and that students could be calibrated from multiple primary calibrations, this will take some significant thought to both come up with an algorithm and then to test it. If you are interested, I'll work with you on this.

Best regards and good luck.

Akash Agarwal

unread,

Mar 25, 2014, 6:29:04 AM3/25/14

to OE...@googlegroups.com

Hi Wayne,

Thanks for sharing your feedback. I will implement them in my project.

Notwithstanding the recent developments in computer grading of essay-like assessments, the technology is perhaps still not sufficiently mature for using these approaches for summative assessment for university credit - however, that's not the focus of your GSoC proposal. I think the lessons we can learn from automating peer evaluation will inform our next iteration in using these technologies.

Linking to posts which had nothing to do with the question. This is where rudimentary automation could help, for example word count, and a few keyword searches of the text. Moreover, peer evaluation would help to address these challenges.

Although I kept it out of my GSoC proposal, I am very enthusiastic about computer grading of essay like assessments and would implement a prototype of it along with the rudimentary automation you stated. I plan to implement the peer evaluation part first, so as you stated, the results from it in some prototype courses would help us to identify the kind to automation more suitable for OERu.

In OERu courses, learners may participate for self-interest, that is they do not intend to complete all the activities. So we will need to think about solutions to avoid assigning a peer-evaluation tasks to these learners. One way to do this is to only assign peer-evaluation tasks to those learners who are pursuing certification for participation. I don't think a pre-course survey is a good way to identify these learners because they may change their minds as the course progresses. Developing an algorithm which distributes peer assessment among those learners who actually post activities would be a good way to solve this challenge.

I agree with you. It would be more practical to assign grading tasks only to those who have themselves written posts. This is what I had done in the grading assessment in the basic prototype.

The course Open_content_licensing_for_educator has the activities: http://wikieducator.org/User:Akash_Agarwal/Prototypes/Activities .
For each of these activities I took all learners who have submitted a post and then assigned them two others that they need to evaluate.
Each learner then has a evaluation page (e.g : http://wikieducator.org/User:Akash_Agarwal/Prototypes/Activities/1st_learning_reflection/Thompson ) where he/she can see the posts that are assigned and can grade the post as per the requirements.

We need to think about an "appeal" mechanism where a learner can flag posts where they feel the peer-reviewers have not assessed the post reliably. An interesting thought is that peer-assessors could loose kudos points for invalid assessments if found to be gaming the system.

Brian stated some interesting ways to deal with this problem such as a calibrating algorithm. To start with for a prototype, we can have a system where the credibility of learners is maintained and is updated dynamically for each activity. Then the actual grading may consider the credibility factor. Also, we could ask the learners to rate themselves too. If the grade they assigned themselves and that assigned by the peers do not differ much, the average or max of these could be taken. If the grades assigned by a peer or the learner himself is too high or too low from the average his credibility could be reduced.

Akash Agarwal

unread,

Mar 25, 2014, 7:05:43 AM3/25/14

to OE...@googlegroups.com

Hi Brian,

I do plan to write code and add on to the functionality of WikiEducator. I will surely look at other systems in place and learn from them. The Workshop module of Moodle does address some of the challenges that we face. It give two grades for each activity, a submission grade and an assessment grade. The final grade for each activity is a combination of both. The assessment grade is based on how accurate is the self and peer assessment of a candidate. The main measure to check this is whether the grades they have assigned is close to the average for the particular submission. This is one particular approach we could think on. Also, Coursera (not open source), which has already proven peer evaluation effective for its courses(some of which have millions of users), uses a combination of peer evaluation and self evaluation and then assigns grades based on it.

I would be very interested in working on the calibration algorithm. One thought that I have now is that we could make it dynamic in nature and then the grading will get better as a course progresses. For, the first activity we could have a combination of submission grade and the assessment grades based on the difference for the averages for a particular submission. The calibration algorithm is initiated at this first activity. Then, as a course progresses, the effect of the calibration can be gradually increased as the algorithm learns and becomes more accurate. Also, if possible in the OERu context we could make this universal,i.e, the algorithm decides the credibility of users and the kind of grades they give (i.e, whether it is generally too high/too low) based on all courses which they are a part of. This algorithm could address the challenge that some learners have a general tendency to assign too high grades or too low grades.

Looking forward to work on this.

Best regards,

Akash

Brian Mulligan

unread,

Mar 25, 2014, 8:39:36 AM3/25/14

to OE...@googlegroups.com

hi Akash.

your idea of measuring the grading credibility of students over time seems a great idea.

It might be worth also including a feature that allows students to rate the usefulness of feedback they receive from peers and their performance in this would go towards their final grade.

Brian

Roger Gabb

unread,

Jul 8, 2014, 10:35:20 PM7/8/14

to OE...@googlegroups.com

We've been searching for a self and peer assessment (SAPA) tool that is not necessarily based on assessment of a submitted piece of work but can be used for assessment of teamwork (f2f and/or online) and provides the learner with effective feedback.

Many existing tools (e.g. Turnitin's PeerMark or the Coursera tool) assume that the student has submitted a document (or some other artefact) for assessment while other team-oriented tools (e.g. WebPA) focus on splitting up the mark/score/grade for a team project between team members and provide almost no feedback to the team members other than the mark/score/grade. Tools that meet our needs are SPARKPLUS (from Keith Willis at University of Technology Sydney) and maybe iPeer (from UBC) BUT we want to use it with a class of 800+ students in a course built around our institutional LMS, which is presently D2L. We therefore need strong linkage between the LMS and the SAPA tool - allowing team membership to be passed from the LMS to the tool and marks/scores to be passed back to the gradebook in the LMS. When you're dealing with 800+ students, team membership and mark/score recording has to be automated. What we need is an IMS LTI compliant tool and this rules out most of the candidates, other than WebPA which is too limited in its functionality as far as we're concerned. SPARKPLUS does what we want it to do but it would require us to build an interface with the LMS from scratch.

I tell this long story to make the point that if you want a SAPA tool to have legs in terms of broad uptake, you'll need to ensure that it is LTI compliant so it can hook up with whatever LMS is being used in the institution. It seems to me that there are some OK SAPA tools out there but that they currently lack interoperability. Will this new tool move in that direction?

Roger Gabb

Wayne Mackintosh

unread,

Jul 8, 2014, 11:38:59 PM7/8/14

to OE...@googlegroups.com

Hi Roger,

Thanks for reaching out and sharing your thoughts.

I can see that LTI compliance is important when designing standalone tools to facilitate interoperability across multiple LMSs.

Give the scope of this development as a short Google Summer of Code project, I don't anticipate that we will be incorporating LTI compliance at this time.

Given the OERu use case context of a wiki based authoring environment for assembling OER learning pathways which are designed for delivery to OERu learners using a PLE approach where learners choose their own interaction and portfolio tools, our needs are considerably less demanding than would be the case for importing SAPA functionality for the local institutional LMS. Our peer evaluation tool is intended for formative assessment and peer learning support and we don't currently have a need for data to be tracked within an LMS. Our learners are not "registered" in the sense that they're taking courses at an educational provider. We have a strong belief that all our OER materials should be openly accessible without the need to register for a course.

We are also experimenting with ways in which our tool will scale for large numbers of learners as we anticipate large course cohorts.

We are also mindful to avoid the temptation of replicating LMS functionality -- its better to use an LMS than for us to try an mirror those features.

As a small charitable organisation which has very limited code development capability -- we're keeping our tools simple but scalable. That said, who knows what the future holds. A disaggregated tool set which could leverage federated technologies would take the OERu collaboration to another level -- but we'll need to be patient ;-).

W

--

--
You received this message because you are subscribed to the Google
Groups "OER university" group.
To post to this group, send email to oer-uni...@googlegroups.com
To unsubscribe from this group, send email to
oer-universit...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/oer-university?hl=en?hl=en
Visit the OER univeristy page on http://wikieducator.org/OER_university

---
You received this message because you are subscribed to the Google Groups "OERu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OERu+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex P.Real

unread,

Jul 10, 2014, 10:49:38 AM7/10/14

to OE...@googlegroups.com

From an assessment perspective, I'd suggest developing score rubrics per assignment (as close to sought learning outcomes as possible, maybe in a questionnaire format). Otherwise learners may not know how to assess their peers, and thus, unwillingly, jeopardise the experience. A task/teaser/sample might be handy too. In addition, assessment professionals should rate at least 15-20% (double scoring) to check results accuracy; unfortunately, significant divergence is bound to appear (non-experts, cultural & linguistic diversity, socio-economic status, psychological traits, etc.).

If you need help, gimme a shout.

Good luck and keep us posted!

Alex P. Real

--

Reply all

Reply to author

Forward