• bitcoinBitcoin (BTC) $ 42,977.00 0.18%
  • ethereumEthereum (ETH) $ 2,365.53 1.12%
  • tetherTether (USDT) $ 1.00 0.2%
  • bnbBNB (BNB) $ 302.66 0.19%
  • solanaSolana (SOL) $ 95.44 1.28%
  • xrpXRP (XRP) $ 0.501444 0.1%
  • usd-coinUSDC (USDC) $ 0.996294 0.34%
  • staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
  • cardanoCardano (ADA) $ 0.481226 2.68%
  • avalanche-2Avalanche (AVAX) $ 34.37 1.19%
  • bitcoinBitcoin (BTC) $ 42,977.00 0.18%
    ethereumEthereum (ETH) $ 2,365.53 1.12%
    tetherTether (USDT) $ 1.00 0.2%
    bnbBNB (BNB) $ 302.66 0.19%
    solanaSolana (SOL) $ 95.44 1.28%
    xrpXRP (XRP) $ 0.501444 0.1%
    usd-coinUSDC (USDC) $ 0.996294 0.34%
    staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
    cardanoCardano (ADA) $ 0.481226 2.68%
    avalanche-2Avalanche (AVAX) $ 34.37 1.19%
  • bitcoinBitcoin (BTC) $ 42,977.00 0.18%
  • ethereumEthereum (ETH) $ 2,365.53 1.12%
  • tetherTether (USDT) $ 1.00 0.2%
  • bnbBNB (BNB) $ 302.66 0.19%
  • solanaSolana (SOL) $ 95.44 1.28%
  • xrpXRP (XRP) $ 0.501444 0.1%
  • usd-coinUSDC (USDC) $ 0.996294 0.34%
  • staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
  • cardanoCardano (ADA) $ 0.481226 2.68%
  • avalanche-2Avalanche (AVAX) $ 34.37 1.19%
  • bitcoinBitcoin (BTC) $ 42,977.00 0.18%
    ethereumEthereum (ETH) $ 2,365.53 1.12%
    tetherTether (USDT) $ 1.00 0.2%
    bnbBNB (BNB) $ 302.66 0.19%
    solanaSolana (SOL) $ 95.44 1.28%
    xrpXRP (XRP) $ 0.501444 0.1%
    usd-coinUSDC (USDC) $ 0.996294 0.34%
    staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
    cardanoCardano (ADA) $ 0.481226 2.68%
    avalanche-2Avalanche (AVAX) $ 34.37 1.19%
  • bitcoinBitcoin (BTC) $ 42,977.00 0.18%
  • ethereumEthereum (ETH) $ 2,365.53 1.12%
  • tetherTether (USDT) $ 1.00 0.2%
  • bnbBNB (BNB) $ 302.66 0.19%
  • solanaSolana (SOL) $ 95.44 1.28%
  • xrpXRP (XRP) $ 0.501444 0.1%
  • usd-coinUSDC (USDC) $ 0.996294 0.34%
  • staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
  • cardanoCardano (ADA) $ 0.481226 2.68%
  • avalanche-2Avalanche (AVAX) $ 34.37 1.19%
  • bitcoinBitcoin (BTC) $ 42,977.00 0.18%
    ethereumEthereum (ETH) $ 2,365.53 1.12%
    tetherTether (USDT) $ 1.00 0.2%
    bnbBNB (BNB) $ 302.66 0.19%
    solanaSolana (SOL) $ 95.44 1.28%
    xrpXRP (XRP) $ 0.501444 0.1%
    usd-coinUSDC (USDC) $ 0.996294 0.34%
    staked-etherLido Staked Ether (STETH) $ 2,367.26 1.4%
    cardanoCardano (ADA) $ 0.481226 2.68%
    avalanche-2Avalanche (AVAX) $ 34.37 1.19%
image-alt-1BTC Dominance: 58.93%
image-alt-2 ETH Dominance: 12.89%
image-alt-3 BTC/ETH Ratio: 26.62%
image-alt-4 Total Market Cap 24h: $2.51T
image-alt-5Volume 24h: $144.96B
image-alt-6 ETH Gas Price: 5.1 Gwei

MORE FROM SPONSORED

LIVE Web3 News

ARTICLE INFORMATION

OpenAI o3 IQ score

OpenAI o3 IQ score surpasses Mensa threshold, setting new AI intelligence benchmark

Khaled Darwish Khaled Darwish

OpenAI o3 IQ score is making headlines after the model achieved a remarkable 136 on Norway’s public Mensa test.

This officially places the model’s performance above 98% of the human population, based on standardized IQ distribution curves. The data comes from the independent platform TrackingAI.org and showcases a significant leap in AI cognitive benchmarks.

The o3 model is part of OpenAI’s elite “o-series,” which has dominated recent intelligence testing. Its 136 score qualifies it for Mensa Norway, marking the first time an AI model meets that threshold under test conditions designed for humans.

The benchmark utilized two different evaluations — an Offline Test and the public Mensa Norway test. While o3 scored a modest 116 on the Offline evaluation, its Mensa score surged to 136, possibly due to its better alignment with human-oriented testing or some subtle overlap in prompt familiarity.

Proprietary edge: o3 outpaces GPT-4o and Llama 4

The OpenAIClick here for more Details o3 IQ score clearly highlights the widening performance gap between proprietary and open-source AI models. While o3 led with 136, GPT-4o scored only 95 on the same Mensa test. Even Meta’s best open model, Llama 4Click here for more Details Maverick, reached only 106.

TrackingAI’s testing method includes a prompt with four Likert-style response options. Each language model must choose one and justify the answer in 2–5 sentences. The best of the seven latest completions is used for scoring, with refusal events logged separately.

Though the scores are compelling, some have noted the lack of confidence intervals and transparency in the prompting process. Without this, reproducibility and interpretation remain limited, even with structured evaluations.

ANOTHER MUST-READ ON ICN.LIVE:

OpenAI o3 IQ score bucks the multimodal underperformance trend

Another standout insight is how o3 defies the trend of underperformance in multimodal models. Previous models like o1 Pro saw a drop in IQ when vision was activated — from 122 to 86 on the Mensa test. But o3 maintains top-tier text comprehension while significantly improving image analysis.

This suggests OpenAI may have made a breakthrough in integrating multimodal data without sacrificing reasoning strength.

Still, even with an IQ of 136, critics argue that short-context reasoning — the kind tested by Mensa — doesn’t reflect real-world capabilities like long-term planning or contextual dialogue. The utility of these high scores remains debated.

Despite questions around methodology, OpenAIClick here for more Details o3 IQ score sets a new standard in AI cognitive benchmarking. As transparency in corporate model development remains elusive, third-party groups like LM-Eval and GPTZero are becoming essential.

More nuanced evaluations will be needed to measure deeper cognitive behaviors beyond IQ-style testing. Still, o3’s Mensa-level score confirms a clear evolution in the reasoning power of today’s best AI systems.

What does an IQ score of 136 mean for an AI like o3?

An IQ score of 136 places the OpenAI o3 model well within the top 2% of human intelligence levels. For an AI, this demonstrates advanced short-context reasoning and pattern recognition abilities. However, it’s important to note that IQ tests like the one used by Mensa measure a narrow form of intelligence. While the o3 model excels at interpreting and responding to structured prompts, real-world AI applications often require broader cognitive functions, including long-term memory, planning, and decision-making. Still, the high IQ score is a clear sign of progress in language model sophistication.

Why did o3 outperform GPT-4o and other models so dramatically?

The o3 model likely benefits from advanced training methods and architectural improvements exclusive to OpenAI’s proprietary pipeline. These improvements may include better alignment with human reasoning patterns or refined prompt interpretation. GPT-4o, while still a powerful model, scored just 95 on the same Mensa test. This gap underscores the advantages held by closed-source models, which often receive more focused optimization and tuning for performance in intelligence benchmarking scenarios.

How reliable are IQ scores when applied to AI models?

IQ scores provide a limited snapshot of a model’s reasoning capabilities, especially in the context of pattern recognition and logic. These tests are designed for humans and are adapted for AI evaluation with constraints like standardized prompts and formatting rules. While they can highlight progress in a model’s problem-solving ability, they don’t capture broader elements like creativity, adaptability, or multi-turn conversation. Hence, IQ tests are useful but not definitive for assessing overall AI intelligence.

Do multimodal models generally perform worse on IQ tests?

Yes, until recently, multimodal models that process images and text together have underperformed their text-only versions in structured intelligence tests. The integration of visual input can introduce complexity that affects reasoning efficiency. However, OpenAI’s o3 breaks that trend by achieving high performance in both text and image reasoning, suggesting significant advancements in how multimodal data is handled. This could indicate a new era where visual and textual understanding work seamlessly together in high-performing models.

FEATURED

EVENTS

Days
Hr
Min
Sec
 

ICN TALKS EPISODES