Developing Grading Rubrics for Writing

7 views
Skip to first unread message

Mark Stellmack

unread,
Mar 30, 2015, 10:29:21 AM3/30/15
to psy_te...@umn.edu

Suppose you would like to assign writing in your class.  You need to be able to evaluate the writing that students produce and assign a score to it.  How are you going to assign scores?  Well, all you have to do is read each paper and decide whether or not it is "good writing."  Students are going to try to write "good" papers and you will judge how "good" the papers are.  We all know what "good writing" is, right?  (As Monty Python put it, "Nudge nudge, wink wink, say no more!")  And when you assign a score to a paper, you can be confident that the relative scores of different papers reflect their relative goodness, right?  And you can be sure that someone else reading the same papers would judge them the same way, right?  And, for that matter, you even know that if you were to grade the same papers again at some point in the future, you would assign them all the same scores as you did the first time...right?

Sounds like a classic measurement problem.  You've got a pile of papers in front of you and you want (or at least you SHOULD want) to assign scores to them in a way that exhibits reliability and validity.  This is particularly important if you are comparing students' scores to each other (when you assign course grades) or if you want to give feedback on writing in a way that is consistent across many graders in many different courses.

"What do you mean?"  The first step is to operationally define your variable:  "good writing."  One way to go about defining "good writing" is to develop a grading rubric that lists the criteria you will use to assign scores to the writing.  Then "good writing" is writing that is assigned a high score according to those criteria.  Of course, once you have your rubric, you should provide that rubric to the students before they do the writing assignment so that everyone is on the same page (so to speak) as to what you will call "good writing".  The students should know what target they're aiming at.

"Lather, rinse, repeat."  Suppose you want to give feedback that is consistent across many graders.  Clearly, a desirable feature of your rubric will be that when different people use the rubric to grade a particular paper, those people will all assign the same score, i.e., you want the rubric to exhibit good interrater agreement, a form of reliability.  So if you're going to do it right, the next step is to have a group of people grade the same set of papers and determine the extent to which their assigned scores agree.  If their scores don't agree, you can attempt to refine your rubric to resolve disagreements and produce greater agreement in future grading.  You may need to do this several times until your interrater agreement seems to have plateaued.  (This step is very labor-intensive and requires access to a lot of examples of student writing, which is probably why few people have the desire and/or resources to do it.)

"Am I right about writing right?"  Another question is whether what you are measuring with the rubric really is good writing (i.e., is the rubric a valid measure of good writing?)  Actually, when you developed the rubric, you were already addressing this issue because presumably you included criteria that reflect what you mean by "good writing".  One possibility is to compare your rubric to standard, established criteria for "good writing" and see how yours sizes up.  (Unfortunately, there are not a lot of standard, established criteria for "good writing", if any.)  You also can ask different people (ideally, people whose opinion about "good writing" you respect) to assess the criteria of your rubric and see if those people agree with your criteria or if anything seems to be missing.  If you revise your rubric, you may need to go back and assess whether it still provides reliable measurements.

The big problem in the whole process is reliability (the lathering and rinsing part).  Ideally, you would eventually get to the point where everyone agrees on the scores assigned to a batch of papers but, in reality, it is not likely to happen.  Several years ago, a group of collaborators and I sought to develop a rubric for grading student writing (APA-style introductions).  We repeatedly lathered and rinsed (graded many papers and compared scores) over the course of many months, revising the rubric as necessary between lathers to try to resolve disagreements.  At the point that we felt we had hit a plateau in terms of interrater agreement, we found that we agreed exactly in terms of score on only about 37% of the opportunities for agreement.  When the same graders graded the same papers a second time, they agreed with themselves (obtained the exact same score) on only 78% of the opportunities to do so.  I say "only" because those numbers are somewhat lower than I hoped for, but the judgment of whether the data represent adequate reliability is subjective.  However, those numbers are consistent with interrater agreement data reported by others for rubrics for writing in several subject areas, including Psychology.

The bottom line is that, even armed with a painstakingly developed rubric, there appears to remain a high degree of subjectivity in grading writing.  Although different graders may agree upon the criteria for judging writing, they will differ substantially in their application of those criteria.  As a result, it is difficult to conclude that you are ever assessing "good writing" in a truly objective sense.

[Details of the research described above can be found in Stellmack, M. A., Konheim-Kalkstein, Y. L., Manor, J. E., Massey, A. R., & Schmitz, J. A. P. (2009).  An assessment of reliability and validity of a rubric for APA-style introductions, Teaching of Psychology, 36, 102-107.  Also see the materials at http://www.psych.umn.edu/psylabs/acoustic/rubrics.htm]

Reply all
Reply to author
Forward
0 new messages