Sunday, May 26, 2013

Assigning grades

In my series of extremely interesting grading-related posts, I want to raise a question about the assignment of grades.

Namely, how on earth are you supposed to assign grades? The ideal, at least in my institution, is that the grade distribution should be Gaussian after removing the fail grades. So far, there are three "schemes" I can think of:
  • Giving grades based on some "objective" scale - i.e., if students get ninety percent or more correct on their exam, they get an A, a B if it's between eighty and ninety, and so on.
  • Total relative adjustment of the scale: The top ten percent of the students gets an A, the next twenty percent gets a B, and so on.
  • A "hybrid" solution. You find natural cutoff points that are not too far away from the "objective" scale so that the number of people who get an A and so on are approximately correctly distributed.
The first option is the most ideal one. It is also the one most likely to give rubbish results, the reason being that there is no such thing as an objective scale. First of all, an exam is in itself a random sample of questions based on the curriculum (at least in the natural sciences), not a complete assessment of the knowledge of the student. Second of all, what the teacher thinks the students should know is not what the students actually know. Sometimes a question is harder to answer than the teacher thought. To think that an objective scale like that can actually exist is naive. Finally, unless the teacher is actually the perfect intellect, the resulting distribution won't be Gaussian. I will concur, though, that in the case of social sciences and the humanities, where the exam's score isn't as easy to determine in terms of right/wrong answers, this scheme might be preferred. Also, in low-attendance courses, it might also be possible to actually take the time to give a well-founded total percentage instead of just averaging over the percentages per question.

The second one is the most appealing to me, at least when the number of students taking the course is large, which is when you would expect something like a Gaussian distribution anyway. The main criticism against this scheme is that the grades become relative, so that an A one year is different from an A the next year. I posit that this is a problem with all of the above schemes. The "objective" scheme will be subject to variation because the teacher is not God, and because you typically don't give the same exam year after year. There is a point at which the second scheme will give larger variations than the objective scheme, but as long as the number of students are high, the total relative adjustment scheme will be more robust than the objective scheme.

The third option I think is an ok compromise if you feel uneasy about the total adjustment scheme. I dislike it because of its non-automated nature - i.e. even once you have assigned a percentage to an exam, you still have to make subjective judgements. Also, it seems to me this method is prone to even more arbitraryness than either of the two previous ones.

It is worth to note that most grading systems explain grades in terms of level of understanding - i.e., an A means that "the student has an excellent command of the subject" etc. In these terms, the objective scheme is the preferred one - there should be no a posteriori tinkering with the results based on the distributions! However - it's impossible to a priori know where to draw the line. If you say that an A should be ninety percent correct or more, then you might end up with no students getting an A because your standards were too high. You might say "well, that's too bad for the students - we cannot lower the bar just because the students do badly." But the point is that you don't know whether you're lowering the bar, because the concept of an ideal 'objective' test is flawed from the outset! If you base your grades on the actual empirical distribution, there still will be incentive for students to do well, because only the best ten percent of them will get an 'A'.

Ideally, then, one should change the whole meaning of the grading system. Instead of saying that grades reflect some kind of absolute skill level (which is a flawed concept anyway, unless you spend extreme amounts of time or unless the number of students is low), the grades should simply reflect which percentile you ended up in. I.e. an 'A' should just mean that the student was among the top ten percent of the class, and so on.

I'm not sure yet what we'll end up using for these exams, although we have used the third approach before. If I have the time, I will write some code to do some statistics on the results of this one to see if there are interesting patterns to be found.

No comments:

Post a Comment