O'Reilly logo

O'Reilly School of Technology

Questions? Call 707-827-7288

Live Chat Log In

Automatic Grading Misses the Mark

This week I read a story about robo-readers – computer programs created to “read” final student essays and calculate scores automatically. The theory holds that these programs will free up instructor time, which will then allow them to increase the number of essays they assign to students, which will enable students to refine their writing skills, without burdening teachers with an overwhelming and unmanageable workload.

As just about anyone who’s ever mastered a skill, taught a course, or received a diploma can attest, practice, practice, practice is the key to real learning. In fact, making mistakes and figuring out how to resolve them is the most educational part of every student’s journey.

With that in mind, it seems to me that practice ought to be valued as much as any final project. The work students create and submit during the learning process, with all of its mistakes, explorations, and creativity, is key to understanding. A “robo-reader” is not equipped to decipher the subtle signals contained in evolving student work; it is able to evaluate general correctness, but not monitor comprehension and growth. So far, only a real human mentor has the capacity to comprehend individual student development and provide tailored guidance.

A computer program can read and tell you if you’re answer is within the confines of what it recognize as a correct response to a given question. But the communication skills of a computer program are limited, even in the best of circumstances.

Consider that it is entirely possible to write a correctly spelled and grammatically correct sentence that makes no practical sense:

“Colorless green ideas sleep furiously.”

Noam Chomsky

Despite its perfectly acceptable sentence structure, this sentence doesn’t work. Good grammar and spelling are necessary for clear communication, but we know there’s much more to it. At some point we want to go beyond plugging words into appropriate spaces and start talking about interesting ideas! Computers are cold, plain devices. Real, human teachers are able to interact, reason, and inspire students to think critically in ways that computers may never be able to do. (Still, we love our computers, even if they can’t reciprocate!)

I take great pride in knowing that O’Reilly School of Technology does not automate the grading of any student work. Instead, we utilize software to make the entire project submission and mentor assessment process easier. When grading a student project, O’Reilly School of Technology mentors have access to every incarnation of that project, from the student’s first submission to the last, and all of the practice steps in between. That’s how O’Reilly School of Technology mentors make sure that each student can produce not only a solid final project, but also has a clear understanding of how they got there, from beginning to end.

Good mentors know that learning happens when students practice; the assignments that lead up to course completion are as important as the final project. That’s the reason we keep this question in mind when creating our courses: If student work isn’t worth a teacher’s time to review, then why would it be worth a student’s time to complete?

  • http://rationalmathed.blogspot.com Michael Paul Goldenberg

    Make that “line,” not “like,” in the first sentence above.

  • http://rationalmathed.blogspot.com Michael Paul Goldenberg

    I’ve always considered that Chomsky like the most eloquent, poetic thing he ever wrote. That’s not a criticism of him, by the way.

    For what it’s worth, I can’t imagine a computer being able to evaluate the aesthetic worth of a poem, play, novel, short story, movie, painting, sculpture, concerto, or pop song. I’m not thrilled with the fact that the Common Core assessment groups are pushing computer-based scoring of performance tasks, as well as computer-adaptive testing. The later is anathema to me and should be to anyone who understands how much such testing disadvantages test-takers. Letting a computer control the assessment process is a very bad idea. Whether computers can meet or exceed the performance of human scorers on rubric-based performance tasks is an interesting question, but until I see strong evidence to support the affirmative, I remain highly skeptical. Given the enormously high stakes being played for, so should everyone, but I suspect lots of greased palms in the halls of various statehouses and the halls of Congress will see to it that business triumphs over kids and learning.

  • http://www.brokenairplane.com Phil

    Re-posted from my Google+ stream

    I have been thinking about my response to the topic of robograders and automatic assessment as I have heard my respected colleagues in CS, math, science, and just general all around great educators write on the subject.

    Once again I find myself in the unpopular position of defending not the current implementation but the idea. In the article linked to below, the author makes the final point, “If it is worth the time for the student to do the work isn’t it worth our time to grade it?” This is a false comparison IMHO as the student is groking with the concept, learning, and playing with the concept; I am initially just verifiying the correctness of the problem.

    Of course when necessary I further dive into the answer to see how and why they made certain choices but I fail to see why I couldn’t have the computer make the first pass so I can focus quickly on those who need immediate help. A well designed question tests not just if they got the answer but how and what misconceptions they may have whether it be with fractions, units, definitions, etc. Is the technology there yet to do this effectively? On the whole no, but these things have to start somewhere and even more so with machine learning that depends on large data sets to improve.

    I would hate this to be considered an endorsement of those currently using automated grading especially for high stakes grading or those who are trying to diminish/eliminate the role of the educator because nothing is more abhorrent to me, but not spending my entire weekend grading and instead opening up a dashboard with a detailed diagnosis that I can then use to drive instruction, support, and further challenge my students sounds empowering to me as an educator.

  • http://www.oreillyschool.com Josh Nuzman

    Thanks for your comments!

    I’d like to further explain how we do things at OST. Students are required to compile code and preview HTML/JavaScript before they hand in projects. This is a “first pass” check to make sure things have a minimum level of functionality.

    After projects are handed in, mentors grade projects using an integrated tool that makes it easy to compile and run programs, and to examine source code. In some cases we do use software to compare output or check to make sure data has been loaded correctly, but this is never used to automatically assign a grade. We may also use software to flag aspects of a project that might need attention.

    We provide as much information to mentors as possible, and let them decide if the project criteria has been met or not.
    To put it differently: we use technology to reduce the friction for students and mentors. It is never used to replace mentors.

  • http://medicalcodingandbillingcareer.net/ Barb Dyer

    Hi Josh,

    Great post. By the way have you heard of the project Sikuli, it’s a software for scripting that uses visual technology to test graphical user interface and then make decisions when certain visual cues are present on screen. If you are grading student built software/websites/apps I think the tool can help you.

    It’s an open source tool developed in Harvard.

  • miketwo

    Great article. Informed and reasonable. I wonder how the author’s opinion might change if we restricted ourselves to talking about automatic math graders, which allow students to practice a certain kind of math problem many times.