High Stakes Corrupt

These two articles got me thinking about the relationship between grades and cheating:

“To Stop Cheating, Nuclear Officers Ditch The Grades” –
– “Wrong Answer: In an era of high-stakes testing, a struggling school made a shocking choice.
Both articles are dramatic, and about more important issues than “just” grades, but that’s the connection that got me thinking: in an essay about assessment and accountability, Lee Schulman (in one of my favorite articles – wrote that “High stakes corrupt.” The NPR article about nuclear missile training is an obvious example of this, and their “grading reform” – getting rid of letter grades and their connection to competition and damaging perfectionism in favor of cooperation and pass/fail marks – makes a lot of sense, and apparently works, which is a bonus in the context of folks who can fire a nuclear missile.
The New Yorker article is a heart-breaking, in-depth, moving story about one teacher’s encounter with a test that felt very high stakes to students and teachers in their school. I’m sure that administrators and others in that district (and their state department of education) wouldn’t call it a high stakes test: it’s a statewide achievement test without impact for individual students. But the article made it clear that the culture of the school, and possibly the district, created the context that this test was definitely high stakes for teachers and schools. I bet there’s a document floating around that district somewhere that says something like “this test is only one measure of achievement, and data from this test should be triangulated with other sources of achievement data before daring any conclusions…” blah blah blah. But what ends up happening is familiar to anyone teaching somewhere with a statewide test (which is, now, every public school, I suspect): the statewide test gradually dominates any conversation about student achievement, and doing “badly” on the statewide test feels like a big deal to everyone, even if official rhetoric claims otherwise.
The articles also differ in many ways, including one very important one: the conclusions. In the NPR story, someone decides to dramatically change their old perfection/competition/ego based A-F grading system in favor of an evaluation system that communicates proficiency and acknowledges that people should probably work, you know, together to prevent nuclear accidents from happening. In the New Yorker article, the passionate teacher gets prosecuted, not the overall system that promotes the test craziness, and that school/district/students lose the chance to work with someone who sounds like a fascinating, dedicated teacher.
As I think about this, I can feel myself shrinking away from the implications. Traditional A-F grading isn’t going away any time soon in middle and high schools, and many good folks (like O’Connor, Guskey, and many others) put a lot of effort into “fixing” grading practices. I like reading their thinking and they are definitely trying to help teachers and students. But the contrast between these articles makes me wonder if all the fixes are ultimately very temporary patches on a tire that isn’t getting anyone where they really want to go.

Putting large-scale assessment in it’s place

Some writers I respect (like Diane Ravitch, Joe Bower, and many others) want large-scale assessment out of schools, period. None of it. Get it the heck out of this school. Thanks, so long, see you later, and let’s get down to the business of teaching and learning. I sympathize with that sentiment, but I don’t share it completely.

(terminology note: I’m using the term “large-scale assessment” to refer to a standardized assessment administered to many students. Usually, these are state-developed or publisher developed tests, like the SAT, ITBS, etc. Many folks call these “standardized tests,” but that term is a bit misleading, I think, because any test a teacher gives in the same way across classes is “standardized.” So I’ll call them large-scale)

I sympathize because of the damage large scale assessment has done and continues to do in schools. The demands of No Child Left Behind and Race to the Top pressure schools and districts to use large-scale tests, which in turn pressure districts and schools to focus more on what the large-scale tests measure, which pressures teachers and others to spend more time on what the tests measure, which reduces the amount of time students get to spend learning what the tests don’t measure, which reduces students’ ideas about what “important” learning is and how they relate to (and like) school, and in the end what the large-scale tests measure define “learning.” The measurement ends up defining learning rather than describing it. (Others have described this process much more thoroughly than I have, such as Diane Ravitch’s excellent book, The Death and Life of the Great American School System)
But I don’t completely share “throw the bums out!” sentiment toward large-scale assessment because I think districts and schools can USE them rather than getting used by them. Large-scale assessments can be good for very specific purposes, and, like Liam Neeson, they have a very “specific set of skills.” If a district/school wants to compare assessment data (achievement, ability, whatever) to a national sample, then you’ve got to use a large-scale assessment. All the statistical trappings that go along with a large scale assessment (like percentile ranks, stanines, etc.) result from the way they’re built and maintained. Districts can use these data for useful purposes (they often used to, in pre-NCLB days), and I think they can again.
But having acknowledged all these potential uses, uses of large-scale assessment data are all kinds of out of whack right now. Let’s not even talk about the most egregious uses (like the really wacko teacher evaluation practices that rely on large-scale assessment results, and “merit pay” value-added schemes encouraged by the race to the top program). All large-scale assessments claim to be “reliable and valid,” but validity is a score use issue not an inevitable “trait” of a test that gets bestowed and never challenged. Large-scale tests are built for specific purposes, and using data from large-scale assessments in ways that they were never designed for (like teacher evaluation) is a serious threat to validity.
Maybe a ruthless “cost-benefit” analysis would be useful. How much value would a district/school have to get from a large-scale assessment in order for the assessment to justify it’s existence? How much “gain” would have to result to justify the expense, hassle, stress, and unintended consequences of a large-scale assessment? I think most (all?) state accountability tests would fail this (non-large-scale, non-standardized) test.

Not for Points blog

I decided to start this blog because I’d like to share some conversations about assessment. Many of these conversations 5c94f320def98767192bd7190985996corbit around the idea that assessment, lately, gets used as a weapon against teachers and students.

It doesn’t have to be that way. Classroom assessments, designed and controlled by teachers, can be a powerful part of learning, instead of getting in the way or learning (or, worse, intimidating people out of learning, or turning school into a “point game.”) Assessments can be empowering, inspiring, exciting, and useful. Right now large-scale assessments (like state accountability tests) get more attention than they deserve. I hope we can put them in their rightful place, and reclaim assessment as tools teachers and students use together for learning.