📊 Full opportunity report: Every Benchmark Launched 2023-2024 Has Fallen — The METR / SWE-Bench / CORE-Bench / MLE-Bench / PostTrainBench Sequence on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Six key benchmarks measuring AI research and development capabilities, launched from 2023 to 2024, have all saturated or are close to saturation within months. This pattern suggests a significant acceleration in AI progress, with implications for AI deployment and policy.
Every major AI research benchmark launched between 2023 and 2024 has reached saturation or is nearing it, according to recent analysis by Thorsten Meyer. This pattern indicates that AI capabilities are advancing faster than previously understood, with implications for industry, policy, and research.
Thorsten Meyer reports that six key benchmarks designed to measure AI research and development capabilities have all either saturated or are tracking toward saturation within a timeframe of months. These benchmarks include SWE-Bench, METR time horizons, CORE-Bench, MLE-Bench, PostTrainBench, and CPU speedup. For example, SWE-Bench, which measures real-world software engineering skills, improved from 2% in late 2023 to 93.9% in May 2026, a 47-fold increase over 30 months, and has been declared saturated.
Similarly, the METR time horizon benchmark, measuring the duration of AI tasks, shrank from 30 seconds to 12 hours over four years, representing a 1,440-fold improvement, with the timeline suggesting near saturation by late 2026. The CORE-Bench, assessing research paper reproduction, was declared solved by its authors in December 2025 after improving from 21.5% to 95.5% in 15 months. Other benchmarks, like MLE-Bench and CPU speedup, are also approaching or have reached their performance ceilings.
These patterns across diverse facets of AI research—software engineering, task duration, model reproduction, and hardware acceleration—highlight a structural trend: rapid, near-complete saturation in capabilities that were once considered challenging or long-term goals. This suggests that AI systems are rapidly closing gaps in research and development skills, with potential impacts on deployment timelines and industry expectations.
Implications of Rapid Benchmark Saturation for AI Development
The saturation of all six benchmarks within months indicates that AI systems are rapidly reaching or surpassing the capabilities measured by these tests. This acceleration suggests that AI research is approaching a phase where further improvements may become incremental rather than transformative, potentially affecting investment, policy decisions, and workforce planning. It also raises questions about the novelty of future advancements and the risk of overestimating AI’s progress based solely on these benchmarks.
AI benchmarking hardware
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Historical Progress and Benchmark Selection
These six benchmarks were specifically designed to challenge AI systems across different research facets, including software engineering, task duration, research reproduction, and hardware acceleration. Launched between late 2023 and early 2024, they aimed to track the pace of AI capability growth. Previous trends indicated steady improvements, but the recent saturation across all six suggests an unprecedented acceleration in AI research maturity. This pattern aligns with other indicators of rapid AI progress, such as hardware improvements and model scaling, but now with concrete benchmarks showing near-complete saturation.
Experts like Jack Clark have argued that such rapid saturation supports forecasts of AI reaching significant milestones by 2028, including near-automated AI research and development, with some models already demonstrating capabilities close to human-level performance in specific tasks.
“The pattern across these six benchmarks is the structural argument. Saturation in all of them within months signals an acceleration in AI capabilities that is more than noise—it’s a curve.”
— Thorsten Meyer
AI model training server
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Uncertainties About Future AI Capabilities and Limits
While the benchmarks indicate rapid progress, it remains unclear whether saturation in these tests equates to comparable real-world capabilities. Some experts caution that benchmarks may be overfitted or that saturation reflects measurement noise or overfitting rather than true capability limits. The long-term trajectory beyond 2026 is still uncertain, especially regarding whether further improvements will plateau or if new challenges will emerge.
Additionally, the impact of saturation on AI safety, robustness, and generalization remains an open question. It is not yet confirmed whether these saturated benchmarks fully capture the breadth of AI research capabilities or if future breakthroughs will require new or more challenging benchmarks.
high performance CPU for AI
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Monitoring AI Progress and Benchmark Development
Researchers and industry analysts will likely focus on developing new benchmarks that challenge AI in different or more complex domains, to assess whether saturation persists. Monitoring the pace of hardware improvements and the emergence of novel AI architectures will also be critical. Policymakers and investors should consider the implications of rapid capability saturation, including potential shifts in AI deployment and regulation timelines.
Additionally, further analysis is needed to understand whether these saturation points translate into tangible improvements in real-world applications or if they primarily reflect overfitting to benchmark tasks. Expect ongoing debate about the significance of these findings and the potential need for new standards to measure AI progress.

Corsair AI Workstation 300 Desktop PC – AMD Ryzen AI Max 385 CPU – AMD Radeon 8050S iGPU (Up to 48GBs vRAM) – 64GB LPDDR5X 8000MHz Memory – 1TB M.2 SSD – Black
AI-Optimized Compact Workstation: Experience AI performance out of the box with the compact 4.4L form factor, built for…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What does saturation of these benchmarks mean for AI development?
Saturation indicates that AI systems have achieved or exceeded the performance levels these benchmarks measure, suggesting rapid progress and potential approaching limits in current research areas.
Are these benchmarks representative of real-world AI capabilities?
While they are designed to challenge AI systems across different facets, it is still uncertain whether saturation in benchmarks directly translates to real-world performance or broader AI intelligence.
What are the risks of rapid saturation in AI benchmarks?
Rapid saturation might lead to overconfidence in AI capabilities, potentially delaying the recognition of limitations or emerging challenges in safety, robustness, and generalization.
Will new benchmarks be developed to continue measuring AI progress?
Yes, experts are expected to develop more complex or different benchmarks to assess whether AI systems can sustain improvements beyond current saturation points.
How might this saturation affect AI policy and regulation?
Policymakers may need to reconsider timelines for regulation, safety standards, and deployment, as rapid capability gains could accelerate the pace of AI adoption and associated risks.
Source: ThorstenMeyerAI.com