Why Gates’ Big Data Experiment Assessing Teacher Performance Failed

Cathy O’Neil, a mathematician, hedge-fund analyst and data scientist, points out in a piece for Bloomberg yesterday why the Gates’ Foundation’s Intensive Partnerships for Effective Teaching experiment not only failed but did actual harm to public schools.

Gathering data from assessments, principal observations of teachers, and evaluations from students and teachers they used an algorithm to determine whether a teacher was adding value.

She writes that the goal was to reward the good teachers and root out the bad.

She writes:

Laudable as the intention may have been, it didn’t work. As the independent assessment, produced by the Rand Corporation, put it: “The initiative did not achieve its goals for student achievement or graduation,” particularly for low-income minority students. The report, however, stops short of drawing what I see as the more important conclusion: The approach that the Gates program epitomizes has actually done damage. It has unfairly ruined careers, driving teachers out of the profession amid a nationwide shortage. And its flawed use of metrics has undermined science.

The program’s underlying assumption, common in the world of “big data,” is that data is good and more data is better. To that end, genuine efforts were made to gather as much potentially relevant information as possible. As such programs go, this was the best-case scenario.

Still, to a statistician, the problems are apparent. Principals tend to give almost all teachers great scores — a flaw that the Rand report found to be increasingly true in the latest observational frameworks, even though some teachers found them useful. The value-added models used to rate teachers — typically black boxes whose inner workings are kept secret — are known to be little better than random number generators, and the ones used in the Gates program were no exception. The models’ best defense was that the addition of other measures could mitigate their flaws — a terrible recommendation for a supposedly scientific instrument. Those other measures, such as parent and student surveys, are also biased: As every pollster knows, the answer depends on how you frame the question.

Read the whole thing.

It must be awesome to get to purchase education policy. How many schools, teachers, and students will they experiment on when they finally learn that this is not the way to go about determining education policy?

Lessons From French Schools

French_flag_in_AngersWe should have learned by now that if you want to know how not to do something learn from the French (I’m only joking – sort of).  Bloomberg has an interesting article that discusses how French Schools show the pitfalls for the Common Core State Standards here in the U.S.

An excerpt:

France’s excellent standards also come at a human cost. Its education system is plagued by a high failure rate and worsens social inequality. About 20 percent of pupils struggle with basic reading, writing and math throughout their school years, according to recent government and international reports. A decade ago, that figure was 15 percent. The vast majority of those in difficulty come from poor or disadvantaged families.

Most troubling, the proportion of students who complete high school is lower in France than in the U.S.; one-third drop out before getting to the baccalaureat, and most of these are working-class and first- or second-generation immigrant children. Comparative data compiled by the Organization for Economic Cooperation and Development’s PISA program show that the socioeconomic background of schoolchildren is as much a determinant of their performance in France as it is in the U.S.

Read the rest.