How complicated CAN we make grades?

Had a long conversation with a curriculum specialist today about grading. I

from http://www.memegen.com/meme/blhwa9
from http://www.memegen.com/meme/blhwa9

get a bit nervous now when someone asks me about grading in middle schools or high schools. Our department hasn’t been involved in grading conversations for a while in secondary school because, I think, our contributions annoyed people after a while.

The curriculum specialist asked me about some questions he heard from teachers about some of the grading categories available to them in their online gradebook. They have “summative” and “formative” categories they can use, and the summative categories account for 80% of the grade, and the formative 20%. This curriculum specialist’s teachers weren’t sure how to use the formative 20% category, and the curriculum specialist asked me for advice.

What in the world can I say that’s useful about this? The phrase “graded formative category” gets thrown around – what the heck? How much time should we spend talking about “fixes” that will help teachers put anything useful into a category called “formative” that ends up weighing in to 20% of a cumulative grade? I don’t know where to start.

Poor formative assessment. I feel bad for the term, and I wonder how we strayed so far. Originally, all it was supposed to mean was something close to “practice” – an opportunity for students and teachers to USE some assessment information to change something about their teaching or student learning. Teachers figure out what to work on next (or what other experiences or practice students need), and/or students figure out what they should study more or differently, or use feedback to improve.

How does that fairly simple idea of formative assessment combine with the idea of a category that “counts” for 20% of your grade? How should teachers decipher a requirement that they record “scores” for some assessments, put them in the 20% formative category, and then explain what that all means in terms of learning and the final cumulative grade?

I don’t even know where to start.

Reducing Diversity

Right now, a book called Classroom Instruction that Works by B.J. Stone is popular in my district. I’ve seen B.J Stone present on concepts from the book, and she’s a very organized, poised speaker. Many of the ideas in the book are familiar to teachers, like reinforcing effort and cooperative learning. B.J. Stone (and the Robert Marzano “system” of books) are good at organizing ideas and research and presenting them in understandable, easily digestible chunks.
But one concept/suggestion stood out to me during the presentation. Stone suggests that buildings should strive to “reduce diversity” in instruction. The idea is that if all teachers in a building converge on a set of common teaching strategies, this consistency and unified effort will help students succeed.
I’m not sure why this is, apparently, an attractive idea, but I am sure that it’s a misguided one. The idea that a group of teachers (like a PLC) might choose to all use the same teaching strategy makes some sense (although there are downsides to that idea too), but the claim that reducing the diversity of teaching strategies in general is a GOOD thing for students – that idea needs a lot of support before I’d buy it. It’s a complicated and odd idea, I think. It assumes that teaching strategies are distinct, categorizable behaviors that can be labelled and categorized. Stone’s book depends on the idea that I can use the same “strategy” as you are using. That’s true in a surface way – we can share ideas and both try them in our classrooms – but on a deeper level, we are going to incorporate them in unique and important ways in our classes. Teaching and learning are very contextual, and I question whether or not it’s a good idea to even try to “reduce diversity” in instruction.

Put the Fun Back in It!

I think we have a bad habit of stripping the fun/soul/grit/funkiness out of cool ideas as we transform them into “education-ese.” For example: Many, many, many teachers, schools and districts use “performance labels” for rubrics and report cards. Even the term “performance labels” is darn unsexy and antiseptic, but I want to focus on the labels themselves.
In my district, 4 level rubrics are common, and the elementary school report card uses four grade levels (side note: I love our elementary school report card in general, and some day, we will wise up and implement a similar report card in middle and high school).
1 = “beginning” or “does not meet district standards”
2 = “emerging” or “approaches but does not yet meet district standards”
3 = “proficient” or “meets district standards”
4 = “advanced” or “exceeds district standards”
Can’t you just hear the students now? “Oh Hooray! I’m … proficient? Seriously? I worked THAT hard, and the best you can come up with to describe me is ‘proficient’?”
Compare those labels with this poster (from @Meffscience on Twitter)
You’re not “proficient” kids – you’re a Jedi!
or this one:
(side note #2: I think I’d like to change the above example and really go for the “medieval” theme – maybe “Novice, Apprectice, Journeyman, Teacher”?)
Why not use “fun” terms with students? When else can we do this? What other “edu-babble” terms could we throw out in favor of some terms with some more life in them?

Bell Curve Zombies

submitted as a letter to the editor, Lincoln Journal Star, Aug. 18 2014

 
The August 17th edition of the paper reprinted an editorial Catherine Rampell of the Washington Post wrote titled “A’s for Everyone: Grade Inflation Lives On.” Judging by her biography on the Washington Post website, Catherine Rampell writes about a variety of important topics, but this editorial demonstrates a limited understanding of how teachers currently talk about teaching, learning, and grading.
The premise of Rampell’s argument is that “a pandemic of meaninglessly high grades” swept through colleges, and only “brave Princeton” University tried to resist this tide of “grade inflation.” In her analysis about why this “pandemic” occurred Rampbell doesn’t address one important and basic question: what are grades supposed to measure and communicate, and what grading systems/philosophies best meet this goal?
Rampbell supports Princeton’s efforts to resist grade inflation, which are based on the traditional idea of the “bell curve” of classroom grades: the assumption that the distribution of student grades in any class should align with the bell curve, with a pre-ordained proportion of students receiving certain grades. A few students get As, more get Bs, most get Cs, etc. Many human characteristics are distributed on the bell curve – height, weight, IQ scores, etc. – but there’s no good reason to assume that a distribution of course grades should be distributed in this way. Ideally, course grades should accurately measure and communicate the knowledge and skills students acquire during a course. Why should anyone assume, before the class starts, that a specific percentage of students will end up learning at an “A” level? Why would anyone want a teacher to communicate to students that, no matter how much they achieve during a class, only a pre-specified number of students will receive an “A” and a few of them will definitely get an “F?”
Rampbell’s nostalgia for bell curve grading systems seems to be based on an underlying theory of intelligence: she dismisses the idea that “modern students are uniformly smarter than their parents,” implying that since there are only a specific percentage of “smart” students in a class, only that proportion of students should receive an A. This is an odd theory of teaching and learning. If teachers could (should?) predict which students could learn the course material based on some initial impression or measurement of “smarts,” why bother to teach anyone else? Good teachers know that multiple factors (e.g. cognitive abilities, motivation, effort, stress, etc.) impact student learning. Rampbell assumes that in any class there is a small percentage of “smart” students, and those students should receive As, and any deviation from that bell curve scheme is evidence of grade inflation. This self-fulfilling grading scheme doesn’t leave room for teachers to help a majority of students to achieve, and for grades to reflect that learning.
The bell curve theory of grading is probably familiar to many of us, but there was never any reason to assume it was, or is, a “good” or accurate grading system. Rampbell could have written about the more important grading conversations going on among educators right now: how to more accurately assess and evaluate student knowledge and skills, and how to communicate information about learning through grading systems. Reviving the zombie of the “bell curve” grading philosophy doesn’t add to this conversation, and returning to old grading systems won’t help teachers or students do what they want to do: teach and learn.
(shorter version: The August 17th edition of the paper reprinted an editorial Catherine Rampell of the Washington Post wrote titled “A’s for Everyone: Grade Inflation Lives On.” Rampell’s editorial demonstrates an outdated understanding of how teachers currently talk about teaching, learning, and grading.
The premise of Rampell’s argument is that “a pandemic of meaninglessly high grades” swept through colleges. In her analysis about why this “pandemic” occurred Rampbell doesn’t address one important and basic question: what should grades measure and communicate, and what grading systems/philosophies best meet this goal?
Rampbell supports the traditional idea of the “bell curve” for classroom grades: the assumption that grades in any class should be distributed along a bell curve, with a pre-ordained proportion of students receiving certain grades. A few students get As, more get Bs, most get Cs, etc. Many human characteristics are distributed on the bell curve – height, weight, IQ scores, etc. – but there’s no reason to assume that course grades should be distributed in this way. Ideally, course grades should accurately communicate the knowledge and skills students acquire. Why would anyone want a teacher to assume that, no matter how much students learn, only a pre-specified number will receive an “A” and a few of them will definitely get an “F?”
Rampbell’s nostalgia for bell curve grading systems is based on an underlying theory of intelligence: she dismisses the idea that “modern students are uniformly smarter than their parents,” implying that since there are only a specific percentage of “smart” students in a class, only those students should get As. This is an odd, elitist theory of teaching and learning. Rampbell assumes that in any class there is a small percentage of “smart” students, and only those students should receive As, and any deviation from that bell curve scheme is evidence of grade inflation. This self-fulfilling grading scheme doesn’t leave room for teachers to help all students achieve, and for grades to reflect that learning.
The bell curve theory of grading was often used in the past, but there was never any reason to assume it was a “good” or accurate grading system. Rampbell could have written about the more important grading conversations going on among educators right now: how to more accurately assess and evaluate student knowledge and skills, and how to communicate information about learning through grading systems. Reviving the zombie of the “bell curve” grading philosophy doesn’t add to this conversation, and returning to old grading systems won’t help teachers or students do what they want to do: teach and learn. )

True Grit

(Note: this blog post appears in a different form at http://pdfrontier.blogspot.com/2014/06/true-grit-is-duckworth-low-down-dirty.html)

The book How Children Succeed, by Paul Tough is an amazing piece of journalism/research summary, and one of those books that captured my thinking for quite a while after I finished it.

Paul Tough spent a couple years researching what he calls “social-emotional” factors that are associated with school success, and he was surprised to find that these factors seem to not only be important, it turns out that they are MORE important than factors like IQ that get a lot more attention in and outside of our school system. Tough makes the claim that our schools operate on what he calls the “cognitive hypothesis”: humans are born with a certain amount of intellectual potential, our environments influence how much of that potential we develop, and the job of the school is to help students use/develop this “cognitive potential” as much as possible. The cognitive hypothesis influences the ways we talk about students (“bright,” “A student,” “Math/science kid,” etc.) and even the structures of school (IQ/ability testing for gifted/talented programs and special education services, etc.) The cognitive hypothesis predicts that high intellectual potential will help students succeed, and schools need to be structured to help those students soar, and support students with “less” intellectual potential succeed as much as they can.

Tough spends the rest of the book carefully describing compelling research and case studies that point to the conclusion that the cognitive hypothesis might be dead wrong. Study after study and story after story pile on top on one another about how “non-cognitive” factors, like perseverance, “grit,” emotional intelligence, and curiosity may be more important in student success than “smarts.” Tough builds a compelling case, I think, that if one of our goals in schools is student success (in college, careers, and/or life), then there is ample evidence to support the claim that we better stop operating under the cognitive hypothesis and start paying attention to these social-emotional factors.

One of the researchers Tough often cites in the book is Angela Duckworth, a researcher from the University of Pennsylvania, and a recent MacArthur “genius” grant recipient. Duckworth got interested in why some people persist and persevere in the face of obstacles and others give up, how important this trait is to our success, and whether this attribute of “grit” can be measured. My favorite research finding: Duckworth studied an incoming class at West Point military academy. People who get admitted to West Point are already an impressive lot: the selection process is more stringent than any Ivy League school, and these are folks who have known much success in life. Duckworth’s team tested everything they could think of about these young people, including all sorts of intelligence and other “talent” measures, along with social-emotional factors like grit. When all the data came back, the only (only!) factor out of the hundreds of variables that actually predicted who would get through the first year of West Point and who wouldn’t was grit.

I asked some folks in my school district to read How Children Succeed and they were as impressed by it as I was, and we got more and more interested in how Duckworth measured grit. A couple middle schools started to measure grit with students, and one even got involved in Duckworth’s research team to help them collect data. Everyone was riding smoothly and happily, when suddenly I saw a blogger like to listen to lash out hard against Duckworth’s grit research. His beef is that Duckworth’s grit claim is darn similar to the same story he’s heard for years about the kids he works with (kids who live in tough family circumstances, well below the poverty line): all they need to do is get “better attitudes,” like grit, and then they can pull themselves up by their own (cowboy) boot straps.

I was shocked: I hadn’t thought of Duckworth’s research like this. I exchanged a few polite comments with some of the people in the anti-Duckworth posse, and one of the them told me to read the book Scarcity. That’s another good one, and it develops a point that Tough only mentions briefly in How Children Succeed: if kids are in conditions of scarcity (high stress, low time, low resources), maybe these conditions get in the way of using any of the social-emotional capabilities, whether or not they are “high grit” or not. Like Tough, these authors carefully use and explain research findings and case studies to prove that conditions of scarcity limit our abilities to think and use our social-emotional resources.

My conclusion right now (but it might change tomorrow) is best represented by what my friend Tom said about the controversy. He works in an elementary school with students who go home to very tough circumstances, and his mind is always on the topic of how to help all the little knee-biters in his school succeed. He said “This reminded me of Herb Kohl’s Not-learning in his essay I won’t learn from you. We need to have two eyes- one that understands the role of the individual and one that works collective to change the contexts of privilege and oppression”. I think that’s a good summary. Duckworth isn’t wrong, and neither are some of the criticisms of her, but neither of them are exclusively right. Just like always in teaching and learning, we have a bunch of different things to keep in mind, and for now, I think grit and scarcity will be close to the the top of my list.

Let it go

One of the most important ideas in David Labaree’s book Someone Has to Fail (http://books.google.com/books/about/SOMEONE_HAS_TO_FAIL.html?id=_vXWKjJAImYC) is his claim that U.S. education has always been locked in a mutually exclusive trap of competing ideas: our country wants education to be an egalitarian force (helping “level the playing field” for all students, being a force for equal opportunity) and a force to separate more capable students from the crowd (helping students with more “merit,” talent, intelligence, or other capability rise to the top). Our country wants schools to do both these tasks well, and at the same time. Labaree points out that a school or education reform effort that champions one of these ideas at the expense of the other will get slapped back. A school that focuses too much on opportunity for all will be criticized for not letting the “best and brightest” go as far and fast as they can. A school that focuses on identifying exceptionally talented, etc. students and supporting their advancement will be criticized for elitism. Maybe this is one of the factors behind what many of us who have been in education for a long time experience as the “pendulum” of school reform: it goes back and forth, from one idea to another and often back to a new version of an older idea. What goes around comes around, because as you favor one “side,” you will experience a push back.
Neither of the “sides” are wrong, but there is inherent tension between the ideas of “opportunity for all” and “identifying and support merit,” and that tension should be recognized and admitted. We need to admit these kinds of conflicts, because if we don’t uncover them, or admit them, we may be surprised later when the pendulum swings back and hits us. There are heaps and mounds of research evidence that tracking based on perceived ability (math ability, reading ability, etc.) may not be a good idea. Why is tracking still the norm? Any effort that tries to disrupt tracking (an egalitarian idea, helping all students) will run into concerns about merit, and why students with perceived “higher” abilities shouldn’t be grouped together so that they can go “farther or faster.” Both sides of this discussion want what they see as “best” for kids, and any policy or practice change favoring one side or the other will get pushed back, so many schools and districts shoot the middle: rhetoric and some practices that de-track, while maintaining many tracked classes without fanfare.
Every educational reform should start with a clearly identified problem statement and end with an offered solution to that problem (I think I’m misquoting Ted Hamann, but I suspect he’ll forgive me). Somewhere in the middle there, maybe it would be useful to acknowledge any of these tensions, these pressures between mutually exclusive but compelling ideas. Education is such a complex endeavor that a move one way means a move away from an attractive and compelling path. A move toward de-tracking classes is a move away from “advanced” or accelerated courses for students who may be “ready” for that environment. Acknowledging that “sacrifice” is not only honest, it’s needed. We need to acknowledge what path we aren’t taking and why, to talk about the path we’re not taking, even talking about what might be good about that path and why we’re giving up those good things in favor of something else.
We may have to start giving some things up, letting them go, in education, in order to make more reasoned and lasting choices, in order to avoid getting smacked in the head by the pendulum. If we really believe in constructivism, what should we acknowledge about the strengths of input-output or transmission models? If we really believe in the importance of critical thinking, what should we acknowledge about the importance of recall of a “core” set of knowledge? What are we willing to let go? Why are we letting it go? What are we moving toward, and what is that path more important?

High Stakes Corrupt

These two articles got me thinking about the relationship between grades and cheating:

“To Stop Cheating, Nuclear Officers Ditch The Grades” – http://www.npr.org/2014/07/28/334501037/to-stop-cheating-nuclear-officers-ditch-the-grades
– “Wrong Answer: In an era of high-stakes testing, a struggling school made a shocking choice.http://www.newyorker.com/magazine/2014/07/21/wrong-answer
Both articles are dramatic, and about more important issues than “just” grades, but that’s the connection that got me thinking: in an essay about assessment and accountability, Lee Schulman (in one of my favorite articles – http://www.changemag.org/Archives/Back%20Issues/January-February%202007/full-counting-recounting.html) wrote that “High stakes corrupt.” The NPR article about nuclear missile training is an obvious example of this, and their “grading reform” – getting rid of letter grades and their connection to competition and damaging perfectionism in favor of cooperation and pass/fail marks – makes a lot of sense, and apparently works, which is a bonus in the context of folks who can fire a nuclear missile.
The New Yorker article is a heart-breaking, in-depth, moving story about one teacher’s encounter with a test that felt very high stakes to students and teachers in their school. I’m sure that administrators and others in that district (and their state department of education) wouldn’t call it a high stakes test: it’s a statewide achievement test without impact for individual students. But the article made it clear that the culture of the school, and possibly the district, created the context that this test was definitely high stakes for teachers and schools. I bet there’s a document floating around that district somewhere that says something like “this test is only one measure of achievement, and data from this test should be triangulated with other sources of achievement data before daring any conclusions…” blah blah blah. But what ends up happening is familiar to anyone teaching somewhere with a statewide test (which is, now, every public school, I suspect): the statewide test gradually dominates any conversation about student achievement, and doing “badly” on the statewide test feels like a big deal to everyone, even if official rhetoric claims otherwise.
The articles also differ in many ways, including one very important one: the conclusions. In the NPR story, someone decides to dramatically change their old perfection/competition/ego based A-F grading system in favor of an evaluation system that communicates proficiency and acknowledges that people should probably work, you know, together to prevent nuclear accidents from happening. In the New Yorker article, the passionate teacher gets prosecuted, not the overall system that promotes the test craziness, and that school/district/students lose the chance to work with someone who sounds like a fascinating, dedicated teacher.
As I think about this, I can feel myself shrinking away from the implications. Traditional A-F grading isn’t going away any time soon in middle and high schools, and many good folks (like O’Connor, Guskey, and many others) put a lot of effort into “fixing” grading practices. I like reading their thinking and they are definitely trying to help teachers and students. But the contrast between these articles makes me wonder if all the fixes are ultimately very temporary patches on a tire that isn’t getting anyone where they really want to go.

Putting large-scale assessment in it’s place

Some writers I respect (like Diane Ravitch, Joe Bower, and many others) want large-scale assessment out of schools, period. None of it. Get it the heck out of this school. Thanks, so long, see you later, and let’s get down to the business of teaching and learning. I sympathize with that sentiment, but I don’t share it completely.

(terminology note: I’m using the term “large-scale assessment” to refer to a standardized assessment administered to many students. Usually, these are state-developed or publisher developed tests, like the SAT, ITBS, etc. Many folks call these “standardized tests,” but that term is a bit misleading, I think, because any test a teacher gives in the same way across classes is “standardized.” So I’ll call them large-scale)

I sympathize because of the damage large scale assessment has done and continues to do in schools. The demands of No Child Left Behind and Race to the Top pressure schools and districts to use large-scale tests, which in turn pressure districts and schools to focus more on what the large-scale tests measure, which pressures teachers and others to spend more time on what the tests measure, which reduces the amount of time students get to spend learning what the tests don’t measure, which reduces students’ ideas about what “important” learning is and how they relate to (and like) school, and in the end what the large-scale tests measure define “learning.” The measurement ends up defining learning rather than describing it. (Others have described this process much more thoroughly than I have, such as Diane Ravitch’s excellent book, The Death and Life of the Great American School System)
But I don’t completely share “throw the bums out!” sentiment toward large-scale assessment because I think districts and schools can USE them rather than getting used by them. Large-scale assessments can be good for very specific purposes, and, like Liam Neeson, they have a very “specific set of skills.” If a district/school wants to compare assessment data (achievement, ability, whatever) to a national sample, then you’ve got to use a large-scale assessment. All the statistical trappings that go along with a large scale assessment (like percentile ranks, stanines, etc.) result from the way they’re built and maintained. Districts can use these data for useful purposes (they often used to, in pre-NCLB days), and I think they can again.
But having acknowledged all these potential uses, uses of large-scale assessment data are all kinds of out of whack right now. Let’s not even talk about the most egregious uses (like the really wacko teacher evaluation practices that rely on large-scale assessment results, and “merit pay” value-added schemes encouraged by the race to the top program). All large-scale assessments claim to be “reliable and valid,” but validity is a score use issue not an inevitable “trait” of a test that gets bestowed and never challenged. Large-scale tests are built for specific purposes, and using data from large-scale assessments in ways that they were never designed for (like teacher evaluation) is a serious threat to validity.
Maybe a ruthless “cost-benefit” analysis would be useful. How much value would a district/school have to get from a large-scale assessment in order for the assessment to justify it’s existence? How much “gain” would have to result to justify the expense, hassle, stress, and unintended consequences of a large-scale assessment? I think most (all?) state accountability tests would fail this (non-large-scale, non-standardized) test.

Not for Points blog

I decided to start this blog because I’d like to share some conversations about assessment. Many of these conversations 5c94f320def98767192bd7190985996corbit around the idea that assessment, lately, gets used as a weapon against teachers and students.

It doesn’t have to be that way. Classroom assessments, designed and controlled by teachers, can be a powerful part of learning, instead of getting in the way or learning (or, worse, intimidating people out of learning, or turning school into a “point game.”) Assessments can be empowering, inspiring, exciting, and useful. Right now large-scale assessments (like state accountability tests) get more attention than they deserve. I hope we can put them in their rightful place, and reclaim assessment as tools teachers and students use together for learning.