In Defense of Standardized Testing
It has become fashionable to argue that intelligence cannot really be measured and that available measures are discriminatory. That is wrong, and harms the very type of equity it means to promote.
It is fashionable now to be against IQ. Against the tests, against the construct, against the idea that some children are, in a measurable and partly inborn way, smarter than others. A recent New York Magazine attack on gifted and talented programs is the latest offender. It dishonestly argues that the concept of measurable intelligence and meritocratic education standards are themselves lies. Sophistry at its finest.
Can intelligence be reliably measured?
The modern story begins in 1905, when psychologist Alfred Binet built the first practical intelligence test to identify French schoolchildren who needed extra help. He compared a child’s performance to age norms and produced a single summary figure, the origin of the familiar intelligence quotient (IQ). Around the same time, the British psychologist Charles Spearman noticed that performance on wildly different mental tasks, vocabulary and spatial puzzles and arithmetic, tended to rise and fall together. A person good at one was, on average, good at the others. Spearman called the shared factor behind this pattern g, general intelligence, and to extract it he helped invent factor analysis, the statistical engine that still powers most of the social sciences.
The reliable measurement of IQ, a proxy for general intelligence, is among the most robust findings in all of psychology. Cognitive test scores predict school performance, job performance, income, physical health, and even how long a person lives, across decades and across cultures, with a consistency the rest of social science can only envy.
The machinery that makes cognitive measurement work, standardization, reliability, factor analysis, predictive validity, is the same machinery that underwrites survey research, personality assessment, educational testing, and political polling. Factor analysis was born in this corner of psychology. Declare that a well-validated cognitive test measures nothing real and you have not retired the IQ test alone. You have pulled the load-bearing wall out from under every instrument by which a society takes its own measurements. You do not get to keep the surveys you like and discredit the method that licenses them.
It’s true that scores in very young children shift from year to year. Coaching and a tired or anxious morning can move a result. The instability of scores in young children is a fair indictment of one-shot testing across development, which is genuinely poor practice. But it is an argument for retesting, not for abandonment, and the longitudinal data run the opposite way as children grow. Far from dissolving, cognitive ability consolidates with age into one of the more stable traits we know how to measure. As for the worry that a score is just a snapshot of one day, a snapshot that predicts your income and your health forty years later is not merely a snapshot.
Test items can carry cultural baggage, like the duck that is supposed to say oink or the weathervane a city child has never seen. Cultural bias towards the types of symbolic mental representation most common in Western education systems is a real limitation, which is exactly why psychometricians spent decades building less culturally saturated tests, like visual and nonverbal reasoning measures of intelligence. A biased item is an argument for a better item, not for the claim that the underlying ability does not exist.
And the history of intelligence testing includes genuinely ugly chapters, from justifying racism to eugenics and forced sterilization of the intellectually disabled. Binet himself objected to the way his tool was later used to stamp permanent verdicts on children. But the history of abuse is a charge against people, not against the instrument. Statistics, medicine, and technology have all been bent toward evil ends, and we do not therefore conclude that science is a myth.
Is intelligence genetic?
A fair critic might grant that the tests measure something real and stable, and still ask whether that something is inborn or merely inherited advantage, the residue of good schools and full bookshelves. Twin studies are necessary to know to what degree intelligence is driven by genes or environment.
Identical twins share all of their genes; fraternal twins share about half. If a trait is shaped by genes, identical twins should resemble each other more, and the size of that gap estimates heritability. For IQ the results have been replicated across dozens of studies, many countries, and many decades. Heritability of intelligence is high, and it climbs with age, from roughly 0.4 in childhood to about 0.6 in adolescence to somewhere near 0.7 to 0.8 in adulthood, meaning up to 80 percent of the variance in intelligence is genetic. Even identical twins separated at birth and raised in different homes still grow up to resemble each other in IQ. Meanwhile the influence of the shared family environment, the bookshelves and the schools, fades toward zero by adulthood. Unrelated children raised in the same household barely resemble each other in intelligence as adults. Molecular genetics, reading DNA directly, has since confirmed the basic picture the twins implied.
Skeptics note, correctly, that environment matters most when a child is very young, and treat this as evidence that intelligence is soft and reshapeable. But the twin data say the reverse. As people age and begin selecting the environments their dispositions pull them toward, genetic influence rises and the family signal fades. The window in which enrichment has the most leverage is early childhood. That is an argument for finding and feeding ability early, not for pretending it is not there.
Inequality is the price of diversity
The motive behind attacks on intelligence research is usually equity. Activists are rightly concerned about disparate outcomes and the many students who lag behind. But inequality is not a problem unique to our education system, capitalism, or even to human societies. Inequality is the inevitable result of the wondrous genetic diversity produced by evolution.
Picture the fairest start imaginable. Every child raised in the same home, fed the same food, taught by the same teachers, loved in the same measure. They would still diverge. Unless the children were genetically identical (and a species of clones would not last a generation, quickly wiped out by inbreeding or parasites), random variation would guarantee disparate outcomes. Some would be quicker, some calmer, some bolder, some more fragile. Chance further compounds inequality: an injury here, a lucky encounter there, a book that lands at the right age. By adulthood even the egalitarian nursery has produced unequal lives. This is not a flaw in the design of society. It is the design of life. Selection has nothing to work on but differences.
So the real question is never whether children differ in ability. They do, measurably, heritably, from very early on. The question is what we do once we admit it. Here the fashionable answer makes its revealing mistake. Confronted with an inequality it finds intolerable, it attacks the instrument that reveals the inequality rather than the inequality itself. The thermometer reports a fever, so we smash the thermometer. But the test did not create the differences between children. It only made them legible. Breaking it does not make children equal. It makes them illegible, which is a different thing, and usually a worse one.
What bad education policy looks like
The clearest place to watch this experiment run is not the elementary school but the university, where the war on measurement went furthest. In 2021, under legal pressure and in the name of equity, the University of California, the most prestigious public system in the country, went fully test-blind, refusing to look at SAT or ACT scores in admissions at all. Five years later the results are in, and they are brutal. At UC San Diego, the share of incoming students needing math remediation rose from roughly one in two hundred in 2020 to about one in eight in 2025. The number testing below a high school math level climbed nearly thirtyfold, and most of those students place below a middle school level. More than 1,400 UC faculty, many of them in math and the sciences, have signed an open letter pleading for the return of the SAT and ACT math requirement, because they are now reteaching middle school arithmetic to undergraduates in calculus lectures.
The system’s own faculty task force had concluded, before the policy changed, that test scores predicted first-year grades, retention, and graduation better than high school GPA did. The faculty voted to keep testing. The Board of Regents overruled them on political grounds. The warning was filed, ignored, and then vindicated, and only now, in 2026, is the system finally moving to reconsider.
The same pattern is playing out throughout higher education as a whole, and none of this was a surprise to any social scientist worth their salt.
Anyone can sit for the SAT, and as its name suggests, it is standardized. That is important because grades vary across schools, and extracurricular achievement is even more heavily confounded by socioeconomic status than critics of standardized tests note. Yes, affluent students may pay for private SAT tutoring. But the internet, and now AI, have greatly democratized education. Talented low-income students can set themselves apart with good test scores. Under test-optional rules, when other evaluative criteria matter more, socioeconomic disparities widen. Not everyone can purchase a coached personal essay, a consultant-built activities list, and a transcript inflated by a forgiving private school.
Ironically, though progressives have led the recent crusade against standardized testing, the standardized test was originally a progressive reform meant to enhance equity.
In the 1930s, Harvard president James Bryant Conant promoted the SAT precisely to break the hold of the inherited Protestant aristocracy that stocked the Ivy League with the well-bred sons of the right prep schools. A common national test, he reasoned, could surface the brilliant farm kid or immigrant’s son whom a pedigree-based system would never see, replacing an aristocracy of birth with what Thomas Jefferson called a “natural aristocracy of talent.” The subjective alternatives now praised as more humane carry an uglier lineage: as the sociologist Jerome Karabel documented, the holistic criteria of “character” and “personality” that elite colleges adopted in the same era were used, in part, to limit the number of high-scoring Jewish applicants the tests kept admitting. In a tragic repetition of history, in 2023 the Supreme Court found Harvard guilty of exactly that type of racial discrimination against Asian applicants with the highest test scores as part of the zero-sum game of affirmative action.
Objective measurement is the liberal instrument. Subjective judgment is the gatekeeper’s. The reformers dismantling the test today imagine they are striking a blow against racism and privilege, when in reality standardized tests are the best way to combat bias in education systems.
That last point matters more every year, because the alternatives to the test are collapsing. Grade inflation has drained high school GPA of meaning, and the personal essay, the supposed humane substitute for a number, is now frequently written by a chatbot. Standardized testing is one of the last common, external, hard-to-fake signals left, and it is precisely the one the reformers threw overboard. Small wonder public confidence in higher education has fallen from 57 percent in 2015 to the thirties in recent years. When institutions abandon their ability to tell who is prepared, the public notices.
The same error appears in miniature in the gifted debate. When New York City replaced gifted testing with parent applications and teacher nominations, early advantage did not disappear. It concentrated, flowing to the families who knew how to work the system and the children already fluent in the cultural cues that adults mistake for brilliance. Take away the objective measure in the name of fairness and you do not get fairness. You get a softer, more deniable form of the same sorting, run by connections instead of by ability.
The shameful New York Magazine article supports these changes, overlooks their limitations, and manages to further distort the research literature on long-term predictors of success. It cites a thirty-five-year study of 677 children enrolled in gifted and talented education programs and notes that 88 percent of them did not go on to reach “genuine eminence” in their careers such as full professorships, Fortune 500 leadership, judgeships, distinguished work in medicine and law. “Only” 12 percent of them did, as though that were a disappointment.
Any statistically literate reader would run the numbers against the base rates. Barely 2 percent of American adults hold a doctorate of any kind, and the slice of the general population that reaches the study’s definition of “eminence” sits well under one percent, plausibly in the low tenths of a percent. Against that floor, 12 percent is an outcome exceedingly over the base rate, and the same research program found its most able subjects earning doctorates at more than fifty times the population rate. The critics found one of the largest predictive signals of early intelligence and described it as a letdown.
That is the through-line of bad education policy: it abolishes the measurement to spare itself the discomfort of what the measurement reveals, and the children it claims to protect are the ones who pay.
What good education policy looks like
Good policy starts by keeping the measure. A standardized test is cheap, hard to game, and blind to charm and connections, which is exactly why it remains the best friend of the talented child whom no one is watching for. The fix for an imperfect instrument is a better instrument, read over time and alongside other evidence, not a retreat to gatekeeping by impression.
There is still room for a liberal affirmative action that does not invite unconstitutional racist discrimination. If what schools really want to select for is general intelligence and competence, it is true that standardized test scores are an imperfect proxy. In my opinion, a 1500 out of 1600 on the SAT from an under-resourced student is more impressive than a perfect score from someone who had expensive private tutoring. But in order for such affirmative action to be effective, it has to be narrowly tailored and data-driven. What is the exact test score disparity across income groups? What is the conditional likelihood of success factoring in this disparity? What objective data can we base these judgements on, so as not to incentivize victimhood narratives in application essays?
In K-12 education, the deepest fix to the public school system would be free market competition. Fund students, not systems, and let providers compete to serve them.
When families can carry their funding to the school that fits, competition tends to raise outcomes, often at lower cost, and frequently most for the disadvantaged. More money poured into a single public monopoly, by contrast, does surprisingly little on its own. Education savings accounts, charters, open enrollment, course-level acceleration, and dual enrollment all do the same essential work. They give a bright child trapped in a weak district an exit, and they let the match between student and school improve without anyone in a central office deciding in advance what every child needs.
When one public system must produce one answer for everyone, gifted education becomes a zero-sum brawl over a handful of seats, and the scarcity is what makes it so bitter. Perhaps the better model for gifted K-12 education resembles our university systems, with (when operating as intended) tiers of selectivity for students to self-sort into their best match. When the funding follows the student, schools’ main worry is not resource allocation, but student retention. Loosen the supply, let many providers serve many kinds of learners, and the parental ambition the critics find unseemly turns positive-sum.
No central planner knows the single correct model of gifted education, which is the strongest reason to let families and localities experiment and let results, rather than ideology, sort the winners.
We will never engineer away the inequality that evolution wrote into us, and the regimes that tried compiled the worst record of the last century. What a free society can offer instead is not equal outcomes but equal standing: the same rights, the same protection of the law, and a fair chance to be seen. The eminent few that the skeptics wave away are the source of a wildly disproportionate share of the medicine, technology, and prosperity the rest of us depend on. Finding talent and feeding it is not hoarding. It is one of the highest-return investments a society can make, and most of the return is paid to strangers. Used well, standardized testing and meritocratic admissions pipelines is how we find the disadvantaged children whose gifts would otherwise go unseen.



