https://www.ajc.com/blog/get-schooled/opinion-maybe-time-say-nope-naep/wNNJdeiRz5FZHuJrmAKSII/

Opinion: Maybe, it’s time to say nope to NAEP

By Maureen Downey

I asked University of Georgia professor and frequent Get School contributor Peter Smagorinsky to share his view on why 2019 NAEP scores declined in reading. He is a distinguished research professor of English Education in the Department of Language and Literacy Education.

As usual, Smagorinsky doesn’t mince words, saying the test items on NAEP have questionable validity and the generalizations from the results have questionable reliability.

By Peter Smagorinsky

U.S. education achievement slides backwards: Substantial decrease in reading scores among the nation's eighth graders

This latest sky-is-falling educational headline followed the recent release of scores from tests administered by the National Assessment of Educational Progress, known as the “National Report Card” and the bearer of the “gold standard” for educational assessment. Of the nation’s 50.8 million public school students, 585,000 fourth and eighth grade students take the NAEP assessment.

The students are chosen through sampling procedures comparable to those used by opinion pollsters. They are assumed to be a representative sample of all schoolchildren. There is no acknowledgement of a possible sampling error: generalizing from a population that does not represent the whole.

The performance on these tests by one out of every 87 students, just over 1% of the national public school population, thus produces the outcome by which all schools can be judged. They are chosen using sampling procedures similar to those that predicted that Hillary Clinton had a 90% chance of defeating Donald Trump for the U.S. presidency in 2015 just before the polls opened.

These tests have become the ink blot into which many people have read their interpretations of the status of public school education. The 2015 scores raised alarms that the entire public school system is deteriorating. When the scores were announced, Thomas J. Kane of the Brookings Institution attributed declines in scores to the Common Core State Curriculum; just two years earlier, Arne Duncan had credited Common Core with increases in test scores, thus justifying his testing regime. Some attributed the 2015 drop to changing demographics that reduced the percentage of White students taking the tests.

The more things change, the more they remain the same. The 2019 scores declined, and the finger of blame is pointed in all directions. Education Secretary Betsy DeVos said, “The results are, frankly, devastating. This country is in a student achievement crisis, and over the past decade it has continued to worsen, especially for our most vulnerable students. . . . This must be America’s wake-up call. We cannot abide these poor results any longer. We can neither excuse them away nor simply throw more money at the problem.” Instead, DeVos believes we should throw money at private and charter schools.

Sarah D. Sparks writes in Education Week that we cannot overlook the inverse relationship between kids’ increased screen time and their declining test scores. Too many videos, too many cryptic and ungrammatical social media posts, too little reading of old-school books with complete sentences and complex chains of thought: together they produce shabby reading habits and abilities, as evidenced by the test scores.

To Thomas B. Fordham Institute President Michael Petrilli, the Great Recession that straddled the G. W. Bush and Obama administrations is to blame for providing developmental challenges to today’s school population. The recession produced cuts in public school funding, concurrent with increases in poverty affecting students’ families. The predictable result was lower literacy rates, now evidenced in NAEP scores. Adia Harvey Wingfield believes declines in school funding and the diversion of resources to private and charter schools are responsible for the slide in public school students’ test scores.

Matthew Ladner believes recent teacher strikes have had a negative impact on test scores, because they reduce contact time between teachers and students. Some believe problems with shifting to digital testing has produced a “mode effect” that depresses scores for students accustomed to paper and pencil tests. To Carol Burris, the scores reflect the problems that follow from implementing corporate-style reforms in schools.

Many of these commentators, from left to right on the political spectrum, interpreted the decline in scores to theories they had developed long before the test results were released. Like a Rorschach test, the NAEP report provides contours that individuals can interpret according to what they already believed.

So, let me add to the interpretive morass: The test scores are declining because the tests themselves don’t measure much that is worth knowing or doing. The NAEP reputation as a “gold standard” assessment appears to have obscured the role of the substance of the exams, which have status as wholly reliable and true indicators of something called student “achievement.” When people interpret the scores, they tend to look everywhere but the tests themselves

I have a personal reason for questioning these test results, that being my own dodgy history with standardized tests. I typically scored about 150 points higher on the math portions of tests than on the verbal component, using the SAT 800 point scale.

But the fact is that I’m terrible at math. In spite of my much lower verbal scores, I became an English major, then an English teacher, and now am a publish-or-perish professor in a Department of Language and Literacy Education. I’m much more verbal than mathematical. But the tests that measured my achievement found the opposite to be so, leading to my lifelong skepticism about standardized tests.

Do you find the NAEP test items to be valid indicators of your own educational achievement? NAEP provides examples from tests from prior administrations. They are transparent about how they measure proficiency in each of many areas. They provide a NAEP Questions Tool that enables visitors to try to answer retired test questions. There are 12 subject areas tested at grades 4, 8, and 12. The tool provides Easy, Medium, and Hard questions for each.

Anyone with any opinion on the NAEP results should spend a little time with these questions and their multiple-choice answers. I have confined myself to the only area of the curriculum that I know reasonably well, what they call Reading and Writing. I have no idea of how to evaluate a math assessment, just as math assessments had no idea of how to evaluate me.

The items look like most tests of this type: students read a passage and answer a multiple-choice question about it. The assumption is that all readers have the same relationship to the text, that the item itself is neutral and agnostic. That assumption renders the reader’s active engagement—or lack thereof—moot. It makes reading an act of decoding, and no more.

But what makes reading interesting and useful is the constructive work a reader does to make sense of a text and make meaning through a reading experience.That essential dimension of reading for understanding gets flushed when the item requires one correct answer from among four text-based choices.

When I read the texts in the items from the NAEP Question Tool, I struggled to stay focused on what the question posed, rather than reading for what I read for. I didn’t enjoy the task of scanning the text to see what I could rule out as a possible answer. My reading for the task was a labor, and my selection of the best of four choices was not at all productive to me as a reader.

I’m an adult interested in how schools function, including how teaching and learning are assessed, and I found the item tasks to be tedious and disengaging. I imagine that a kid subjected to tests frequently—most kids, that is—would not have my dedication to trying to study the problem and penetrate the thinking of the test designer to determine what that person would consider to be the best of the four choices, and how the other choices were constructed as decoys.

Rather, a kid might just think, crap, another stupid test, I’ll get through it and hope we do something interesting when we’re done. Maybe that’s why scores are declining. The test items have questionable validity, the generalizations from the results have questionable reliability, and the assumption that kids take these tests seriously, making the results so very important, has no merit whatsoever.