In Atlanta, 35 educators were indicted in the Atlanta Public Schools cheating scandal, all but three of whom had surrendered at the Fulton County jail by Wednesday morning. The cheating scandal was first reported by Atlanta Journal Constitution reporters, who used statistical tests to document test score irregularities. John Perry, who did the statistical analysis for the stories, explains in the latest IRE Journal how they did it.
School test score data available at NICAR database libarary
The Atlanta Journal-Constitution, after releasing its “Cheating Our Children” series that identified suspicious test scores around the country, provided the nICAR Database Library with test scores data gathered from state education departments.
From the AJC: “the data include state testing data paired in approximate cohorts by school, test subject and grade. An approximate cohort would pair, for example, average third-grade math scores at a school in year 1 with fourth-grade math scores at that school in year 2.”
The data are currently available for free to all IRE members.
For more information, please contact NICAR at firstname.lastname@example.org or 573-884-7711.
By John Perry
There are many ways that educators have devised to manipulate achievement test results. The most blatant, and probably the easiest to discover, is simply taking an eraser and correcting student answer sheets.sBut there are also ways that are subtle and more difficult to detect. Teachers have walked the room, signaling students who mark a wrong answer. Test administrators have gained early access to test questions and used them to make practice worksheets. They can seat struggling students next to proficient students, and as one principal told his staff, if students copy off each other, “there’s nothing we can do.”
Much of this happens behind the closed door of a school classroom. Without a whistleblower, this kind of cheating is hard to uncover. And as reporters at The Atlanta Journal-Constitution have learned in four years of covering cheating in Atlanta and nationwide, teachers who turn in their colleagues often risk their own careers.
But we’ve also learned that test scores for groups of students behave predictably, with little change between years or grades. If average scores show large jumps or dives, something other than education may be going on.
We began to suspect that Georgia achievement test results might not be what they appeared in 2008. About 40 percent of eighth graders failed spring math test that year, after a new curriculum was introduced. But in the fall, the state Education Department announced that more Georgia schools than ever before had met their Adequate Yearly Progress goals set by the No Child Left Behind Act.
AYP data showed that many schools had met their goals only because of extraordinary gains on math retests after a few weeks of summer school. Thus began our four-year adventure in the power, and the limitations, of using sta-tistical analysis to uncover cheating by educators on state achievement tests.
For our first cheating story, we had the best possible information – student-level data with both the spring score and the summer retest score. Browsing through the data, it was easy for Heather Vogell, the education reporter on our investigative team, to find unlikely results. At a school where more than 30 kids failed the spring test, summer school brought all of them up to passing level, and half of them to the highest “exceeds expectation” level.
To take a broader view, we used a simple statistical technique. We converted the average score changes at each school and grade to a z-score, which expresses the change in standard deviations, or the typical score change.
In our story, we focused on the most unbelievable gains, ranging from 4 to 9 standard deviations.That first story, suggesting that educators were cheating based only on a statistical analysis, was nerve-wracking, but it established a pattern we would repeat several times over the next four years. Statistical analysis cracked the door, allowing traditional reporting techniques to throw it open.
As a result of that first story, the state conducted its first erasure analysis, focusing on schools with large summer-school gains. The results confirmed our reports, and a principal and assistant principal at one school named in our story plead guilty to altering state documents.
The story also sparked calls from teachers who said they knew of cheating. None would go on the record and most remained anonymous. But it led us to suspect that Atlanta Public Schools might have a particular problem with cheating. Vogell followed up with a story showing that Atlanta treated cheating reports much differently from other districts, rarely finding substance in any allegation.
The next spring, we conducted a statewide analysis of test results. This time, we could not get data tracking of individual students from test to test. We were forced to use averages by school, grade and test subject. We used linear regression to calculate an expected average for each school, grade and subject – what we call a class – based on the result for the previous grade in the previous year.
In effect, we were using regression as a descriptive statistic to tell us the probability that a gain or loss was random chance. If the change was improbable, it was likely something unique had occurred. We believed principals and the superintendent should be able to explain that unique event.
The statewide analysis pointed dramatically at Atlanta, and that became our focus. Shortly after our report, the state released its first statewide erasure analysis, largely confirming our findings. Vogell convinced a few teachers to speak on the record. This all led to the governor’s appointment of a special investigator to look into Atlanta cheating. In 2011, that investigation implicated around 180 Atlanta teachers and administrators in cheating. The investigators with subpoena power were able to reveal details of the cheating, such as pizza parties where teachers and administrators changed test answers.
In the fall of 2011, we decided to answer the next logical question: was Atlanta unique? We had been asking if cheating occurred at individual schools. Now we were looking for districts where the pattern resembled Atlanta and suggested cheating was systemic. After talking with statistics and testing experts, we came up with a two-tiered method of identifying problem districts. First, we used linear regression as we had before to identify unusual score changes at the class level.
Then we looked at the distribution of flagged classes among districts. Given the percentage of classes flagged statewide in a given year, we calculated the probability that a district would have some number of classes flagged by random chance. When we found districts with a highly improbable concentration of flagged classes, we took that as an indication of a district that might have a cheating problem.
We knew that collecting data from 50 states would be a massive undertaking. It actually took seven months, and the efforts of a lawyer in several instances. So before we undertook that chore, we conducted two pilot studies to test whether our methodology would work.
We had Georgia testing data, as well as erasure analysis results and findings from cheating investigations. We also found that Texas posted extensive data online, and we had the results of The Dallas Morning News statewide cheating investigation. With this data, we could test whether our methodology could identify districts that we knew had cheated.
We then extended our pilot study, with the help of data that had been collected by the American Institutes for Re-search for their work equating state test score results to a national test. And again, our methodology was able to identify schools and districts where Nexus and Google searches had turned up confirmed cheating scandals.
These pilot studies gave us confidence that collecting the data would be worth the effort. Our first story, which ran March 20, 2012, identified about 200 districts nation-wide where unusual concentrations of unlikely test results should warrant investigation. We followed up with open record requests to selected districts for complaints about cheating and found reluctance among district administrators to investigate these reports deeply.
We also found many National Blue Ribbon Award winning schools with a series of unusual gains leading up to the award, followed by big score drops afterwards. And a survey of state education departments found that few states use statistical analysis to test results to find cheating, or even simple security measures such as barring teachers from testing their own students.
It’s too soon to know if our national analysis will have results. Historically, federal education officials and local districts have been reluctant to standardize best practices for testing or look for cheating. But with scandals spreading beyond Atlanta to other districts, we believe it is essential to return integrity to the process of school testing. As we’ve noted in our coverage: it is the kids who are cheated when testing integrity is not assured.
John Perry wrote his first major CAR project in 1995, a series using census data to show how downtown Oklahoma City had been encircled by a ring of increasingly concentrated poverty. He was database editor at The Oklahoman from 2000 to 2006. He was a senior CAR fellow at The Center for Public Integrity and since 2008 has been the data specialist on The Atlanta Journal-Constitution’s investigative team.