The new computer-based standardized tests that rolled out in the majority of states last school year largely align with Common Core standards – but developers still need to ensure that the tests are high quality, a new report has found.
Technical and editorial issues, like grammar mistakes and confusing multiple-choice answers, accounted for most of the problems found in tests created by PARCC and Smarter Balanced, two of the main contractors, according to a study released Thursday by the Thomas B. Fordham Institute, a conservative-leaning think tank.
Over the last two years, 32 reviewers composed of educators and experts collected data, and practiced and rated the English Language Arts and math sections for grades 5 and 8 from four assessments: Partnership for Assessment of Readiness for College and Careers, or PARCC; Smarter Balanced; ACT Aspire; and a state assessment in Massachusetts. The reviewers evaluated whether the tests matched benchmarks established by the Council of Chief State School Officers’ “Criteria for Procuring and Evaluating High Quality Assessments.”
Researchers said that the small percentage of quality issues, which typically affected one item in a test form with 40 to 50 sets of questions, was significant enough to impact the accuracy of the scores.
State are paying millions of dollars for the new tests, and many have been racked with technical glitches that ranged from slow Internet connections to cyber attacks. PARCC tests, which are being used in nine states, cost $23.97 per student, while Smarter Balanced tests, used in 17 states, cost $27.30 per student. Last spring, about 40 percent of students in grades 3 to 11 nationwide took one of the four tests.
Experts said the tests could be tightened up.
“Sometimes there might be a slight grammatical error,” said Nancy Doorey, project manager and co-author of the analysis. “There might be a word choice that left something slightly ambiguous, or could possibly be interpreted in a different way.”
For instance, for an item that was supposed to have only one correct answer, reviewers felt that multiple answers could be considered correct. Reviewers also found minor errors in punctuation and spelling.
In addition, technology-enhanced items, or TEIs, were often not found to be meaningful in improving the quality of the tests. TEIs allow students to perform certain functions, like adding and changing text and navigating menu bars.
Reviewers reported that, in some cases, “TEIs were used seemingly to no advantage.” TEIs tend to cost more to develop than other test formats, which means a higher price overall for the assessments. Researchers suggested that developers use TEIs strategically.
Critics of high-stakes testing have ripped school leaders for focusing too much on testing rather than learning in the classroom. But reviewers found that the test developers managed to align the time spent taking tests with the complexity of the skill sets they assessed.
According to a study by the Council of Great City Schools, students spent up to 25 hours per year on mandatory tests, and many of them are not aligned to state standards. The Fordham Institute reviewers were more optimistic.
“We’ve dreamed of evaluating the tests that go along with those standards,” said Michael Petrilli, president of the Institute. “Many of us believe that [tests] do actually drive what happens in the classroom in terms of the curriculum instruction.”
Reach the reporter at firstname.lastname@example.org and follow her on Twitter @yizhuevy.