Why ‘official’ is doing a lot of work
Search for ‘official IQ test’ and you will find a small industry of websites that have simply declared themselves official. One site calls its product ‘certified.’ Another puts the word ‘official’ directly in its domain name. Neither explains what the certification body is, who did the norming, or what the standard error of measurement looks like. The label is being used as a marketing word, not a technical one.
That gap is worth closing. Psychometricians have fairly precise criteria for what makes a cognitive assessment valid and standardised. Those criteria are not secret. They are published in the Standards for Educational and Psychological Testing (AERA, APA, and NCME, 2014), a document that has served as the field’s reference point for decades. Once you know what those criteria are, it becomes easy to sort the real thing from the noise.
The four pillars of a legitimate IQ test
1. Standardisation on a representative sample
A standardised test is one whose norms were established by administering it to a large, carefully constructed reference population. The Stanford-Binet 5 (SB5), published by Riverside Publishing in 2003, was normed on 4,800 individuals stratified by age, sex, race/ethnicity, geographic region, and socioeconomic status to match U.S. Census data. The Wechsler Adult Intelligence Scale (WAIS-IV), published by Pearson in 2008, used a similarly stratified sample of 2,200 adults.
This matters because an IQ score is not an absolute number. It is a relative number: it tells you where a person falls in the distribution of scores produced by the reference population. If the reference population is a convenience sample of college students, or self-selected internet users, the score is meaningless as a measure of general cognitive ability. The norm is the test.
2. Reliability: consistency across time and form
A test is reliable if it produces consistent results. The standard metric is the test-retest reliability coefficient, which measures how stable scores are when the same person takes the test twice. For the SB5, test-retest reliability across age groups ranges from roughly .90 to .95 (Roid, 2003). For the WAIS-IV, the full-scale IQ test-retest coefficient is approximately .96 (Wechsler, 2008).
Those numbers matter because a test with low reliability is measuring noise as much as signal. If your score would swing 20 points depending on the day, the result carries little information. A coefficient below .80 is generally considered inadequate for high-stakes decisions. Many online tests have never published a reliability coefficient at all.
3. Validity: does it measure what it claims to measure?
Validity is the harder criterion. A test is valid to the extent that scores on it actually correspond to the construct being measured, in this case, general cognitive ability (the g factor, in psychometric terminology). Validity evidence comes from several sources: factor analyses showing the test’s subtests load onto the expected latent factors; convergent validity studies showing the test correlates with other established measures; and predictive validity studies showing the test scores predict outcomes like academic achievement and occupational performance.
The g factor is one of the most replicated findings in all of psychology. Scores on diverse cognitive tasks tend to correlate positively with each other, and that shared variance can be extracted as a general factor. A test that does not load substantially on g is not, in any meaningful sense, measuring intelligence. It may be measuring vocabulary, or pattern recognition in a narrow domain, or simply the ability to complete an online form quickly.
4. Standardised administration conditions
This is the criterion that online tests, including this one, cannot fully satisfy. Clinical IQ tests are administered under controlled conditions by a trained examiner. The examiner monitors for fatigue, distraction, and anxiety. They can establish rapport with a child who is nervous. They can note whether a subject appeared to be putting in genuine effort. They can catch ceiling and floor effects in real time and adjust.
The Standards document is explicit: scores obtained under non-standard conditions should be interpreted with caution and flagged as such. When you take a test at home, on your phone, with the television on, the administration conditions are not standardised. The score you get is real data about how you performed that day, in those conditions. It is not equivalent to a clinically administered score.
What ‘official’ actually means in practice
In the United States, there is no government body that certifies IQ tests as official. The American Psychological Association does not issue a seal of approval. What the field has instead is a combination of peer review, published technical manuals, and professional consensus.
A test earns credibility by:
- Publishing a detailed technical manual with norming data, reliability coefficients, and validity studies
- Having its psychometric properties independently reviewed in journals like Psychological Assessment or Journal of Psychoeducational Assessment
- Being used by licensed psychologists in clinical and educational settings, which creates accountability because practitioners stake their professional reputation on the tools they use
- Disclosing the limitations of the instrument, including its standard error of measurement (typically around 3 to 5 IQ points for well-constructed tests)
No online test, including the one on this site, meets all of these criteria in the same way that the SB5 or WAIS-IV does. That is not a reason to dismiss online assessments entirely. It is a reason to understand what they are: screening tools that can give you a reasonable estimate of where you might fall in the distribution, while acknowledging that the estimate carries more uncertainty than a clinically administered result.
The standard error of measurement: a number most sites hide
Every test score comes with a confidence interval. For the SB5, a full-scale IQ of 100 comes with a 95% confidence interval of roughly 96 to 104 (Roid, 2003). That means if you scored 100, the test is saying your true score is probably somewhere between 96 and 104, not that it is precisely 100.
For less carefully constructed tests, the confidence interval is wider, sometimes much wider. A test with a reliability coefficient of .80 and a standard deviation of 15 has a standard error of measurement of about 6.7 points. That means a 95% confidence interval of plus or minus 13 points. A score of 115 on such a test is consistent with a true score anywhere from 102 to 128, which spans from ‘above average’ to ‘highly gifted.’ The single number is almost meaningless at that level of imprecision.
When a site reports your IQ to two decimal places without mentioning confidence intervals, that precision is false. Real psychometrics is honest about uncertainty.
Why the question matters beyond test-shopping
Understanding what makes a test legitimate is not just useful for choosing where to take one. It is useful for interpreting any score you have ever received. If you took an online test in 2012 and got a score of 127, you should want to know: what was the norming sample? What is the test-retest reliability? Was the administration standardised?
Those questions are not pedantic. They are the difference between a number that tells you something real about your cognitive profile and a number that tells you how you performed on that particular set of items on that particular afternoon.
The same critical framework applies to high-stakes uses of IQ scores. Educational placement decisions, disability determinations, and forensic evaluations (where IQ scores can influence sentencing under Atkins v. Virginia) all require clinically administered tests with documented psychometric properties. No court accepts an online IQ score as evidence of intellectual disability. No school district places a child in a special education program based on a website quiz. The stakes clarify the standards.
What to do with this information
If you need an IQ assessment for clinical, legal, or educational purposes, you need a licensed psychologist administering a validated instrument like the SB5 or WAIS-IV. That is the honest answer, and it is the one most sites with a commercial interest in your taking their test will not tell you.
If you are curious about your cognitive profile, want a rough benchmark, or are interested in the experience of taking a structured cognitive assessment, an online test can serve that purpose. The key is to hold the result with appropriate uncertainty: treat it as a plausible estimate, not a certified fact.
For a sense of what a structured online cognitive assessment looks like, you can explore the free assessment on this site and read the methodology notes alongside your result. The score you receive comes with the same caveat that honest psychometrics always carries: it is one data point, not a verdict.
For more on how the original Stanford-Binet scale was constructed and what it was designed to measure, see the history of Alfred Binet and his approach to measuring intelligence. And if you are weighing your options between online and in-person testing, a comparison of IQ testing venues covers the practical trade-offs in more detail.
FAQFrequently asked questions
Is there an official government-certified IQ test?
No. In the United States, no government agency certifies IQ tests as official. Credibility in the field comes from published norming data, peer-reviewed validity studies, and adoption by licensed psychologists, not from a certification body.
What is the difference between the Stanford-Binet and an online IQ test?
The Stanford-Binet 5 is a clinically administered instrument normed on 4,800 stratified individuals, with published reliability and validity data reviewed by independent researchers. Online tests vary widely in quality; most have not published comparable technical documentation, and none can replicate standardised administration conditions.
Can an online IQ score be used for educational placement or legal purposes?
No. Courts and school districts require clinically administered tests with documented psychometric properties. An online score is not accepted as evidence in disability determinations, special education placements, or forensic proceedings such as Atkins hearings.
What is a standard error of measurement and why does it matter?
The standard error of measurement (SEM) quantifies how much a score is likely to vary due to test imprecision. For the Stanford-Binet 5, the SEM is roughly 2 to 3 points; for less reliable tests it can exceed 6 points, meaning a reported score of 115 might reflect a true ability anywhere from 102 to 128.


