Gestalt Shift: grading

Showing posts with label grading. Show all posts

Wednesday, May 29, 2013

Context in grading

This is another grading-related post, but it's also about context, which is one of my favourite terms.

I believe context is extremely important. Understanding context is what keeps humans from being machines. I think at least eighty percent of interpersonal conflict comes from disregarding or misunderstanding the context. Probably I will say something on this later.

But this post will be on context when grading. I don't have that much to say on the topic, but I needed to point out that when you grade exams, the ultimate goal is to assess whether the student has grasped the curriculum or not.

A person who is bureaucratic by nature (i.e. has a tendency to ignore context) will simply look at whether the student has written down the correct answer (in the natural sciences, that is). If it's not on the paper, it is irrelevant for the grading process.

And by doing so, the bureaucrat has failed to accomplish the goal of grading - namely to assess the student's grasp of the curriculum.

That is because what is written is not the only source of information available to the grader. Being a human, the grader also has access to the context of what is written.

As an example: If a student, during an explanation of some kind, uses the wrong word for a key term, the bureaucrat will automatically see that as an error. Don't get me wrong - it might be an error, but only insofar as it demonstrates a lack of understanding by the student. How to we ascertain this? By examining the context. If a student otherwise clearly shows what he/she is talking about, demonstrating an excellent command of the subject matter, then this error in wording shouldn't be taken as a symptom of lack of understanding, but simply as a symptom of momentary forgetfulness. However, if, along with this error, the student writes an explanation which shows that he/she has just been memorizing the curriculum, not really understanding what is going on, the error can be taken as a symptom of lack of understanding. In other words, the context determines whether this error should be penalized or not!

More on context later.

Sunday, May 26, 2013

Assigning grades

In my series of extremely interesting grading-related posts, I want to raise a question about the assignment of grades.

Namely, how on earth are you supposed to assign grades? The ideal, at least in my institution, is that the grade distribution should be Gaussian after removing the fail grades. So far, there are three "schemes" I can think of:

Giving grades based on some "objective" scale - i.e., if students get ninety percent or more correct on their exam, they get an A, a B if it's between eighty and ninety, and so on.
Total relative adjustment of the scale: The top ten percent of the students gets an A, the next twenty percent gets a B, and so on.
A "hybrid" solution. You find natural cutoff points that are not too far away from the "objective" scale so that the number of people who get an A and so on are approximately correctly distributed.

The first option is the most ideal one. It is also the one most likely to give rubbish results, the reason being that there is no such thing as an objective scale. First of all, an exam is in itself a random sample of questions based on the curriculum (at least in the natural sciences), not a complete assessment of the knowledge of the student. Second of all, what the teacher thinks the students should know is not what the students actually know. Sometimes a question is harder to answer than the teacher thought. To think that an objective scale like that can actually exist is naive. Finally, unless the teacher is actually the perfect intellect, the resulting distribution won't be Gaussian. I will concur, though, that in the case of social sciences and the humanities, where the exam's score isn't as easy to determine in terms of right/wrong answers, this scheme might be preferred. Also, in low-attendance courses, it might also be possible to actually take the time to give a well-founded total percentage instead of just averaging over the percentages per question.

The second one is the most appealing to me, at least when the number of students taking the course is large, which is when you would expect something like a Gaussian distribution anyway. The main criticism against this scheme is that the grades become relative, so that an A one year is different from an A the next year. I posit that this is a problem with all of the above schemes. The "objective" scheme will be subject to variation because the teacher is not God, and because you typically don't give the same exam year after year. There is a point at which the second scheme will give larger variations than the objective scheme, but as long as the number of students are high, the total relative adjustment scheme will be more robust than the objective scheme.

The third option I think is an ok compromise if you feel uneasy about the total adjustment scheme. I dislike it because of its non-automated nature - i.e. even once you have assigned a percentage to an exam, you still have to make subjective judgements. Also, it seems to me this method is prone to even more arbitraryness than either of the two previous ones.

It is worth to note that most grading systems explain grades in terms of level of understanding - i.e., an A means that "the student has an excellent command of the subject" etc. In these terms, the objective scheme is the preferred one - there should be no a posteriori tinkering with the results based on the distributions! However - it's impossible to a priori know where to draw the line. If you say that an A should be ninety percent correct or more, then you might end up with no students getting an A because your standards were too high. You might say "well, that's too bad for the students - we cannot lower the bar just because the students do badly." But the point is that you don't know whether you're lowering the bar, because the concept of an ideal 'objective' test is flawed from the outset! If you base your grades on the actual empirical distribution, there still will be incentive for students to do well, because only the best ten percent of them will get an 'A'.

Ideally, then, one should change the whole meaning of the grading system. Instead of saying that grades reflect some kind of absolute skill level (which is a flawed concept anyway, unless you spend extreme amounts of time or unless the number of students is low), the grades should simply reflect which percentile you ended up in. I.e. an 'A' should just mean that the student was among the top ten percent of the class, and so on.

I'm not sure yet what we'll end up using for these exams, although we have used the third approach before. If I have the time, I will write some code to do some statistics on the results of this one to see if there are interesting patterns to be found.

Saturday, May 25, 2013

Grading as concentration practice

Grading exams is, as I have mentioned before, a mind-numbingly boring task. I am of the belief, though, that doing boring stuff can be good for you from time to time, especially if you use it for the right purposes.

I, for instance, have a slight problem concentrating on the task at hand. I'm surely not alone in this, even if I sometimes get the feeling that everyone else is much better at focusing than I am. My brain offers virtually zero resistance when being hijacked by the urge to check some social medium for updates. I need to teach my brain self-defense.

So far I haven't been very structured about it. I just learned about the Pomodoro technique, which I might try if I'm unable to hack this on my own.

But as of now, I am trying to hack this problem on my own - so I decided to use the grading process, in which I was stuck anyway, as a means to this end.

The first couple of days grading I didn't do this, and it basically degenerated to the point where after each exam I graded I would watch a YouTube video. Since every exam took about ten minutes to grade once I got up to speed, this made for a very attention-decifit-enhancing technique.

After that, though, I started setting limits, as in "no YouTube or social media before lunchtime, and do constant grading until then". Yes, I told myself to grade for two-three hours straight with no breaks. I think for a task which requires no creative input such as this, this is defensible (you don't need a break to mull over what you're currently doing) and it promotes concentration for extended periods of time, which currently is my major weak spot when it comes to productivity. And another thing - many programmers talk about being in "the zone". I cannot understand how you can get in the zone with only 25 minutes (as per the Pomodoro technique) available at a time?

So how did the concentration practice go? I would very often slip, though I did notice an increased resistance from my brain when the impulse to check on social media came. However, the slips lasted shorter than usual, and I did find myself forcing my brain to accept that there would be no break after this exam, just another exam to grade. I was basically telling my brain to shut up and suck it up, because it would get no external stimuli, no rewards until the time was up.

You little scumbag! I got your name! I got your ass! You will not laugh! You will not cry!

All in all, I found it a good exercise. I imagine now that I am better at focusing. I have taken no breaks, for example, during the writing of this blog post. I found the method of mentally allocating time for a task a good thing, and I will try to combine this with another zen-like technique which I'll write about later.

Hopefully, this has been an important step in making my brain less addicted to outer stimuli, which I think is the basic problem I have. God willing, I'll be able to keep this up!

Friday, May 24, 2013

Grading: What is this I don't even

Grading can occasionally be a profound glimpse into the human psyche under pressure.

For instance, some people, when they don't know an answer to a question, they will try to bullshit their way to an answer. This is much easier to pull off with an oral exam. For a written exam.. not so much.

And the thing is, doing this might actually be harmful, for two reasons:

It might ruin the overall impression of your exam - i.e. if you give a bullshit answer early on, the grader might be slightly predisposed to look less favorably upon the rest of your answers
It might reveal your ignorance, thus actually giving you a lower grade than you would have gotten if you had just shut up.

Now, the first item above is arguably a problem - what you answer in one part of your exam shouldn't be held against you at another part. However, I think it's inevitable as long as there are humans grading.

The second item above, I'm not so sure is a problem. If some writes a satisfactory answer to a question - containing the bare minimum of what is required, not demonstrating superior understanding but still answering correctly, I might give eighty percent (say) for that question, simply because I must assume that the person knows what they are talking about. However, if that person feels like they haven't given a fulfilling answer, and then starts throwing in stuff they think might be true, then I'm getting a definitive confirmation that this person indeed doesn't know what they are talking about - thus I might actually lower the grade. To me, this makes sense - it's a kind of "innocent until proven guilty". Others take a more liberal stance, saying that as long as the right answer has been written down, it doesn't matter what else is also written.

It is of course important exactly what kind of bullshit has been written - if it's simply information that has no relevance to the question, such as demonstrating your knowledge about the human genome when asked about that of a pig, then I agree that you shouldn't be penalized - you're keeping the bullshit away from the breadbin. However, when the bullshit starts encroaching on the perfectly fine sandwich that is your basic answer, that sandwich, too, will start to smell.

Thursday, May 23, 2013

Grading

The last week I've been grading exams for a college-level introductory course to my field for non-scientists. I'm a TA for this course, and so we also have to step up when the time for grading comes.

The grading process is very boring. After doing it for extended periods of time, my mind has been noticeably numbed. However, while grading one starts to notice a couple of things, and those can be interesting.. if you're in that kind of mood.

For instance, how are you supposed to grade an exam? That of course depends on what type it is. This particular exam consisted of fifteen questions, and being an exam on a science course, the answers to the questions were relatively well-defined. However, being a qualitative science exam, there were still a bit of leeway.

Even so, after grading around twenty exams, you start to notice patterns, and you stop reading the answers as carefully as you did in the beginning. Got the formula right? Check. Drew this graph correctly? Check. Included that particular process in the explanation? Oops, missed that one. That's a couple of points off.

How grading makes me feel

When grading previous exams in the same course, I have tried to set up a checklist for each question, and then going through the checklist, giving points for each point contained in the answer. However there are a couple of problems with this approach.

First of all, even though there is a solution provided by the main teacher, the main teacher doesn't really know what his students know, especially for such a low-level course. Therefore, trying to build a checklist based on the solution provided by the teacher will prove to be a bad match when facing actual exams, in the sense that you will typically emphasize stuff that noone knows, or you will emphasize stuff that everyone gets right.

This is why you need a training set. You need to look at a number of exams, going through the answers and identifying which parts separate the wheat from the chaff. And then, ideally, you should go through the same set again, this time using your checklist to actually grade those exams.

That's.. not going to happen. At least not for me. I used to simultaneously grade and build up my checklist, meaning that the first ten-twenty exams probably were a bit off. However! We are two people grading the same exams, so as long as the other person starts at some other point than me, this approach is still pretty sound.

The second problem with the checklist approach is that the checklist doesn't convey enough information. Sometimes, you read an exam and you just know that this person has an excellent command of the material. And sometimes you read an exam and you realize this person has simply memorized the material, not really understanding what's going on. However, the checklist doesn't really differentiate between them, unless you put in some kind of checkpoint that says "Deep understanding: two points".

This could work, and I did something like that the last time I graded. However, this time around, I tried not using a checklist, rather trying to give a more "holistic" number of points for each question. That is, I tried to identify to what degree the person had understood what was going on.

This doesn't always work, since many questions are simply of the "regurgitate what you have learned" type. In those cases i would follow something like a mental checklist still. But some questions require more understanding, and in those cases I felt like this approach was better. Surely, this approach means that someone who answered the exact same thing might end up with a different percentage for that particular question, but since a) the exam is made up of fifteen questions, b) we are two graders and c) you get a discreticized letter grade anyway, I don't think this is a crucial problem.

This post is already pretty long.. I think I will split this grading experience into several posts.