“[W]ith a real nightingale we can never tell what is going to be sung, but with this bird [a mechanical nightingale] everything is settled. It can be opened and explained, so that people may understand how the waltzes are formed, and why one note follows upon another.”
– from Hans Christian Andersen, “The Nightingale,” 1844.*
I. The Game of Writing.
Last Thursday, the NYT ran an article about recent innovations in mechanical essay grading. You’ve probably read it by now, but you know the gist even if you haven’t. Geeks everywhere want to spare faculty the burden of grading student essays so that we can concentrate on other things. [Technology isn’t just benign, it’s good for everybody!] I always thought that grading essays was the thing that we humanities professors were supposed to concentrate upon, but then again what do I know?
As this blog has become all-MOOCs, all the time, here is the part of the article that I found most interesting:
Two start-ups, Coursera and Udacity, recently founded by Stanford faculty members to create “massive open online courses,” or MOOCs, are also committed to automated assessment systems because of the value of instant feedback.
“It allows students to get immediate feedback on their work, so that learning turns into a game, with students naturally gravitating toward resubmitting the work until they get it right,” said Daphne Koller, a computer scientist and a founder of Coursera.
There is so much packed into those two extraordinary paragraphs that I barely know where to start. When MOOC providers champion “the value of instant feedback,” my first question is “value to whom?” I do a lot of grading of written essays in the course my job, and I can tell you that the reason this process often takes so long is because I couldn’t possibly give instant feedback to students even if I wanted.
Good essay questions are about ideas. The essays students write should be about ideas too. That means I have to sit and think about the ideas that students write in order to grade those questions. Instant feedback is therefore only a good thing if you think that writing assignments are something to get past rather than an opportunity for learning or, God forbid, reflection.
And then there’s that Koller quotation, one of a long series of quotes by Coursera’s founders that have continually left my jaw scraping the floor. Suppose I ask my students to explain the historical impact of the New Deal. What exactly is the “right” answer? I always tell my students that I don’t grade on the basis of what their argument is, I grade on the basis of how well they defend it. How is any artificial intelligence going to evaluate the inevitable issues of morality that good historical questions invoke from students? It won’t, of course, and that should be a problem.
Perhaps more importantly, when students keep revising and resubmitting, who exactly are they trying to please? Programmers? What do they know about good writing? What values do they bring to the table? Objectivity is not neutrality, as Thomas Haskell once explained. As I write these words, this comment is at the top of the “Reader Picks” section of the comments under that NYT article:
Last year when my daughter was in 7th grade, her teacher started using computer essay grading. She would write her essay at home, using the computer, and would get a score. My daughter loves to write but got frustrated because the computer insited on correcting the grammatical errors of portions of the essay in which she used poetic language. In order to get a higher score, she begrudgingly changed her essay.
In short, computer grading destroys precisely the kind of creative thinking that writing is supposed to encourage.
Oh yeah, machines also aren’t very good at determining the accuracy of facts, which might be a problem in…you know…history courses.
II. Feedback Schmeedback
Reading that Koller quote also made me wonder exactly what kind of feedback students get when their essays are machine-graded. After all, when I force students to play the game of writing , I make them write drafts. On those drafts, I leave lots of comments. Those comments, in turn, serve as a guide to help students do better on their final papers.
So what kind of comments do students get back on machine-graded essays? Are they just blundering around in the dark? That sounds a lot more frustrating than fun. That NYT article suggests that the machines “provide general feedback, like telling a student whether an answer was on topic or not,” but what does that mean exactly?
In order to answer these questions, I did what any good 21st Century cyber-citizen does, I asked Twitter. Follow that link through a long series of tweets and there are some excellent responses (to go with the inevitable less-than-140-character wisecracks). Nevertheless, I still felt the need to dig deeper into this issue.
From what I can tell online, it appears that the big debate in the world of machine-grading is whether the scores that machines spit out match the same scores awarded by human graders. Nowhere could I find anything about the machines giving comments, let alone comments that might actually prove useful. It’s all about numbers, as if the quality of any piece of writing could ever be reduced to a single digit and a couple of categorizations.
Almost none of these computer science geniuses seem to understand that humanities disciplines are humanities disciplines because the answers to the kinds of questions we ask don’t have easy answers. This is from Slate, published last year, discussing the problem of applying this technology to my actual field of expertise:
Compare and contrast the themes and argument found in the Declaration of Independence to those of other U.S. documents of historical and literary significance, such as the Olive Branch Petition.
Brown University computer scientist Eugene Charniak, an expert in artificial intelligence, says it could take another century for computer software to accurately score an essay written in response to a prompt like this one, because it is so difficult for computers to assess whether a piece of writing demonstrates real knowledge across a subject as broad as American history.
This may explain why Coursera offers peer-grading for one set of its courses, and is so enthusiastic about machine-grading essays for some others. Indeed, doing this work I realized that the machine-grading problem is just about the exact equivalent of the peer grading problem. They use these strategies because of the economics involved, not because they’re the best things to do for students. That’s what makes quotes like this (from the same NYT article) so incredibly infuriating:
“One of our focuses is to help kids learn how to think critically,” said Victor Vuchic, a program officer at the Hewlett Foundation. “It’s probably impossible to do that with multiple-choice tests. The challenge is that this requires human graders, and so they cost a lot more and they take a lot more time.”
Notice the slight-of-hand involved there? Computer graders are much better than multiple-choice tests, not human graders. Maybe they are, but who says those are our only two options? As Mark B. Brown has argued, the fact that we’re even having this debate is an acknowledgement of permanent austerity. In order to prevent professors and students alike from getting up in arms about this entire discussion, the MOOC enthusiasts and computer science geniuses that enable them have to redefine what education means.
III. “[W]ith this bird everything is settled.”
My goal as a teacher is to get students to decide for themselves what they think about history. Do the proponents of mechanized grading even care about such things? The kind of feedback that students get on machine-graded essays (or on peer-graded essays for that matter) suggests no.
As Mark Cheathem has strongly suggested elsewhere, machine-graded essays and scare tactics go together like wine and cheese. “You must automate everything!,” the profiteering vultures tell us, “Otherwise, the country will fall behind!” [Isn’t it really interesting that this strategy transcends national boundaries? You’d think that the international professoriate could all just slow down together and keep ourselves employed, but I don’t have my hopes up.] If you think this argument is effective on seasoned professors who should really understand the concept of source bias better, imagine how effective it would be on undergraduates.
I can just hear the pitch now: Don’t learn anything about critical thinking. Critical thinking can actually impede your job prospects. It’ll be just like The Organization Man all over again, only this time they’ll have studies to back them up:
[D]uring the great IT boom, the returns to cognitive skill rose. Since then, the process has gone into reverse: demand for cognitive tasks is falling. Perhaps this is because installing robots consumes more resources than maintaining them, or perhaps it’s simply that the robots are doing an increasing number of those cognitive tasks. But whatever the reason, we no longer want or need so many skilled workers doing non-routine tasks with a big analytical component. The workers who can’t get those jobs are taking less skilled ones. The lowest-skilled workers are dropping out entirely, many of them probably ending up on disability.
There are 115,000 janitors with college degrees in the United States. Therefore, anybody who gets one must be a sucker. Of course, not having a college degree will pretty much doom your chances of getting one of the remaining jobs that require critical thinking (and its corresponding pay level), but who wants to stand in the way of a newly emerging cliché?
What we do know is that cheapening education this way will assuredly put a lot of humanities professors (especially already-underemployed adjuncts without the protection of tenure) onto the unemployment line. I say if we fall for these scare tactics and accept the values that mechanized grading represents, then we deserve to be there. Instead, we need to make the case that the skills we teach are important irrespective of how much money students can earn by using them. Kind of like listening to the song of a real nightingale.
Certainly, mechanical nightingales have yet to replace real nightingales out in the world. After all, they’re far too expensive. However, the values that the mechanical nightingale represents have done enormous damage to other bird species. Take the Passenger Pigeon, for example. Tens of thousands of those birds used to darken American skies:
Now they’re gone. I, for one, feel like I’ve missed something, even if looking at a huge flock of birds has no commercial value.
In short, everything about the Passenger Pigeon is now settled – not in the same way that everything about the mechanical nightingale is settled, but settled nonetheless. Devalue critical thinking skills to the point that machines grading essays becomes acceptable and everything about education will be settled as well. Our students will be settled like the mechanical nightingale is settled, singing the same song every time. We humanities professors will be settled the same way that the Passenger Pigeon is settled, lucky if someone bothers to stuff us and display us anywhere since we’ll become forgotten relics of a bygone era.
But at least we won’t have to waste our time grading papers.
* I am, of course, not nearly well-read enough to pick that reference out of thin air. I got it from one of my favorite books of all time, Rebecca Solnit’s River of Shadows. Also, this post wouldn’t have been possible without the help of a slew of my tweeps, especially Cedar Riener, Mark Cheathem and Rohan Maitzen.