The intelligence test that started it all: Stanford-Binet's unlikely origin story

In 1904, a French psychologist was handed an impossible task: sort the children Paris couldn't teach. What he built instead changed how the world thinks about the mind.

17 May 2026

Sepia archival illustration of Alfred Binet and Théodore Simon collaborating at a Sorbonne laboratory desk in 1905.

Key takeaways

Alfred Binet designed the 1905 Binet-Simon scale as a practical diagnostic tool for Paris educators, not as a theory of fixed, heritable intelligence.
Lewis Terman's 1916 Stanford revision introduced the IQ score and transformed Binet's cautious diagnostic instrument into a broad claim about human potential.
The Stanford-Binet has been revised five times since 1916, each revision renorming the test against a new population sample to correct for score drift and cultural bias.
Binet explicitly argued that intelligence was not fixed and that the scale's purpose was to trigger better instruction, not to issue a final verdict on a child's capacity.
Online cognitive assessments share the empirical tradition of the Stanford-Binet but are not equivalent to a clinically administered and normed SB5, and results should be interpreted accordingly.

A commission nobody wanted

In the autumn of 1904, the French Ministry of Public Instruction convened a committee with a mandate that sounds, on its surface, almost administrative: figure out which children in the Paris school system were too cognitively limited to benefit from ordinary classroom instruction. The committee was not asked to understand those children, or to help them. It was asked to sort them.

Alfred Binet, the most productive experimental psychologist France had produced in a generation, was on that committee. He had spent the previous decade studying attention, memory, and reasoning in children, often using his own two daughters as subjects. He was skeptical of the grand theories of intelligence that were fashionable in European scientific circles, suspicious of skull measurements and the confident hereditarian claims that flowed from them. What he wanted was something more modest and more honest: a practical tool that could identify which children needed different instruction, without pretending to explain why.

The result, produced with his collaborator Théodore Simon in 1905, was the Binet-Simon scale. It was not the first attempt to measure intelligence. But it was the first that worked well enough to be used, revised, and ultimately exported around the world. Understanding how it came to exist, and what happened to it afterward, tells you most of what you need to know about the promise and the peril of the intelligence test as an institution.

What Binet actually built

The 1905 scale consisted of thirty tasks, arranged roughly in order of difficulty. Some were simple: follow a moving object with your eyes, identify common objects. Others were more demanding: define abstract words, explain the difference between two similar concepts, repeat back a sentence of increasing length. The tasks were not random. Binet and Simon had tested them on children of different ages and selected those that best distinguished children who were developing typically from those who were not.

This empirical approach was the scale’s most important feature, and also its most underappreciated one. Binet was not starting from a theory of what intelligence is and then building tests to measure it. He was starting from a practical question (which children are struggling, and how much?) and working backward to find tasks that could answer it. The distinction matters enormously. A theory-first approach tends to produce tests that measure whatever the theorist believed was important. An empirical approach produces tests that measure whatever actually predicts the outcome you care about, which may or may not correspond to anyone’s prior beliefs.

Binet revised the scale in 1908 and again in 1911, the year of his death. The 1908 revision introduced the concept of mental age: the idea that a child’s performance could be expressed not as a raw score but as the age at which typical children achieved the same score. A nine-year-old who performed like a typical seven-year-old had a mental age of seven. The concept was intuitive, communicable, and, as it turned out, deeply problematic in ways that would take decades to fully appreciate.

The Atlantic crossing

Binet died in 1911 without knowing what his scale would become. The American psychologist Henry H. Goddard had already translated it into English in 1908, and he used it in ways Binet would likely have found alarming. Goddard was a committed hereditarian who believed that low intelligence was a genetic defect, largely untreatable, and a threat to the social order. He used the Binet-Simon scale to classify immigrants arriving at Ellis Island, famously concluding that a startling proportion of them were “feeble-minded.” The methodology was catastrophically flawed. The tests were administered through interpreters, to people who had just crossed an ocean, in a language they did not speak, under conditions of exhaustion and anxiety. The results were treated as biological facts.

The more consequential American revision came from Lewis Terman at Stanford University. Terman standardized the scale on a large American sample, added new items, and introduced the intelligence quotient, or IQ, as the primary unit of measurement. The formula was simple: divide mental age by chronological age and multiply by one hundred. A child whose mental age matched their chronological age had an IQ of exactly one hundred. A child who performed above their age level had an IQ above one hundred, and so on.

Terman published the Stanford Revision and Extension of the Binet-Simon Intelligence Scale in 1916. It became known, quickly and durably, as the Stanford-Binet. The name stuck Binet’s empirical caution to Terman’s institutional ambition, though the two men’s visions of what the test should do were quite different. Binet had wanted a diagnostic tool for educators. Terman wanted a map of human potential, one that could guide decisions about who deserved opportunity and who did not.

The test goes to war

The First World War accelerated everything. When the United States entered the conflict in 1917, the Army needed to classify nearly two million recruits quickly. A committee of psychologists, including Terman, developed the Army Alpha (for literate recruits) and Army Beta (for illiterate and non-English-speaking recruits) tests, based directly on the Stanford-Binet model. For the first time, intelligence testing was applied not to individual children in a clinical setting but to masses of adults under institutional pressure.

The Army data, published after the war, were widely cited as proof that intelligence could be measured at scale and that the results were meaningful. They were also used to support immigration restrictions and eugenic policies, a chapter in the history of intelligence testing that is impossible to read without discomfort. The tests had been designed for one purpose and were being used for another, by people who were often more interested in confirming their prior beliefs than in understanding what the tests could and could not tell them.

This pattern, of a tool designed for a narrow practical purpose being stretched to cover questions it was never built to answer, is the central tension in the history of the intelligence test. It has not gone away.

What the revision history reveals

The Stanford-Binet has been revised five times since 1916, most recently in 2003. Each revision has involved restandardizing the test on a new, more representative sample of the population, updating the items to remove cultural and linguistic bias that accumulated over time, and refining the theoretical model of what the test is measuring. The fifth edition, known as the SB5, measures five cognitive factors: fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory.

The revision history is instructive in itself. The fact that the test has needed repeated, substantial revision is not a sign of failure. It is a sign of intellectual honesty. A test standardized on a particular population in a particular era will drift out of calibration as the population and the era change. The Flynn effect, the well-documented rise in average IQ scores across the twentieth century, means that a test normed in 1960 will systematically overestimate the intelligence of someone tested in 1990. Keeping a test calibrated requires ongoing empirical work, not just faith in the original design.

This is one of the sharpest distinctions between a properly standardized clinical test and the many intelligence tests available online. Clinical tests are periodically renormed against representative samples. Most online tests are not. The score you receive from an online assessment reflects your performance relative to whoever happened to take that test, under whatever conditions they took it, without the controls that make a clinical result interpretable. That is not a reason to dismiss online assessments entirely. They can be genuinely informative as rough indicators of where you sit on certain cognitive dimensions. But it is a reason to hold the result lightly, and to understand what you are actually measuring. You can read more about what distinguishes a properly validated test from a consumer product in our overview of what makes an IQ test official.

Binet’s own caveat

Binet was, by the accounts of his contemporaries, a careful and somewhat melancholy man. He had watched his early work on graphology and hypnosis collapse under scrutiny, and the experience made him permanently skeptical of confident claims. He was explicit, in his writings, that the scale was not a measure of something fixed or innate. It was a measure of current performance, under specific conditions, on specific tasks. A low score was a reason to intervene, not a verdict.

He wrote, in a passage that has been quoted often enough to become almost a cliche but that still deserves to be taken seriously, that he did not believe intelligence was a fixed quantity, that it could be increased with practice and good instruction, and that the purpose of identifying a child’s current level was to help them rise above it. He called the program of instruction he imagined “mental orthopedics,” a phrase that captures something the IQ fundamentalists who came after him consistently missed: the test was meant to be the beginning of a conversation, not the end of one.

The gap between what Binet intended and what the Stanford-Binet became is one of the more instructive stories in the history of science. Tools acquire the values of the people who use them. A measuring instrument designed with humility can be wielded with arrogance. And an arrogant use of a measuring instrument tends, eventually, to discredit the instrument itself, even when the instrument, used carefully, still has something real to offer.

What it means to take the test today

More than a century after Binet and Simon sat across a table from Parisian schoolchildren, the intelligence test remains one of the most contested objects in psychology. Researchers argue about what it measures, how much of what it measures is heritable, whether it captures anything that matters beyond academic performance, and whether the concept of general intelligence is a useful scientific construct or an artifact of the statistical methods used to find it. These are genuine, unresolved questions, and anyone who tells you otherwise is selling something.

What is not in serious dispute is that performance on well-designed cognitive assessments predicts a meaningful range of outcomes, from academic achievement to professional performance, better than most alternatives. The prediction is imperfect. It is not destiny. It captures something real about how a person processes certain kinds of information at a particular moment in time. That is worth knowing, as long as you know what you are knowing.

If you are curious where you sit on the dimensions the Stanford-Binet tradition has been measuring for over a century, our free online cognitive assessment offers a starting point. It is not a clinical test, and it will not produce a result equivalent to a full SB5 administration by a trained psychologist. But it is built on the same empirical tradition that Binet started in a Paris laboratory in 1905, and it will give you something to think about. Which is, in the end, what Binet had in mind.

For a deeper look at what these assessments are actually designed to capture, the article on what intelligence tests actually measure is worth your time. And if you are weighing different options for testing, the guide on where to take an IQ test lays out the trade-offs between online, in-person, and clinical settings clearly.

FAQFrequently asked questions

Who invented the first intelligence test?

Alfred Binet and Théodore Simon published the first practical intelligence scale in 1905, commissioned by the French Ministry of Public Instruction to identify children who needed different educational support. Earlier attempts to measure cognitive ability existed, but the Binet-Simon scale was the first to be empirically validated and widely adopted.

What is the difference between the Binet-Simon scale and the Stanford-Binet?

The Binet-Simon scale was the original 1905 French instrument, revised by Binet and Simon in 1908 and 1911. The Stanford-Binet is the American revision produced by Lewis Terman at Stanford University in 1916, which introduced the IQ score and standardized the test on a large American sample. The Stanford-Binet has been revised five times since and remains a clinical standard today.

Did Binet believe intelligence was fixed?

No. Binet explicitly argued that intelligence was not a fixed or innate quantity and could be improved through targeted instruction. He developed the concept of 'mental orthopedics,' a program of cognitive training meant to help children identified by the scale improve their performance. This view was largely ignored by the American psychologists who popularized his test.

Are online IQ tests based on the Stanford-Binet?

Most online cognitive assessments draw on the same empirical tradition and task types as the Stanford-Binet, but they are not equivalent to a clinical administration. Clinical tests are periodically renormed against representative population samples and administered under controlled conditions by trained psychologists. Online tests typically lack these controls, so results should be treated as rough indicators rather than clinical measurements.

ReferencesSources

New Investigations upon the Measure of the Intellectual Level among School Children (1911 Binet-Simon scale) Alfred Binet and Théodore Simon (1911)
The Measurement of Intelligence Lewis M. Terman (1916)
Terman's Kids: The Groundbreaking Study of How the Gifted Grow Up Joel N. Shurkin (1992)
Stanford-Binet Intelligence Scales, Fifth Edition: Technical Manual Gale H. Roid (2003)
The Mismeasure of Man Stephen Jay Gould (1981)

The intelligence test that started it all: Stanford-Binet's unlikely origin story

A commission nobody wanted

What Binet actually built

The Atlantic crossing

The test goes to war

What the revision history reveals

Binet’s own caveat

What it means to take the test today

FAQFrequently asked questions

ReferencesSources

Explore

The Test

Legal & Contact

A commission nobody wanted

What Binet actually built

The Atlantic crossing

The test goes to war

What the revision history reveals

Binet’s own caveat

What it means to take the test today

FAQFrequently asked questions

ReferencesSources

Read nextRelated reading

What 'intelligence test' actually measures — and what it doesn't

What makes an IQ test official?

Where to take an IQ test: online, in-person, and what each actually measures

Explore

The Test

Legal & Contact