PhyseaWiki How AI actually works Papers physea.ai →

Open questions

Is AI about to keep racing forward, or level off?

Two trends look strong: the length of tasks AI can do keeps doubling, and the cost of running it has collapsed. But reasoning limits suggest a ceiling, and serious people disagree on which signal wins.

Last updated 2026-06-15 · Physea Labs

Will AI keep racing ahead or settle into a plateau? Smart, well-informed people disagree, and the honest answer is that the evidence pulls in two directions at once.

Two trends point up. The research group METR measured how long a task an AI can finish on its own, scored by how long it takes a human, and found that length doubling roughly every seven months over six years.[1] A January 2026 update found the pace had picked up since 2023, to a doubling time of about 131 days, which is closer to four months.[2] At the same time, cost has fallen sharply. Stanford’s AI Index reports that running a model at the quality of 2022’s GPT-3.5 dropped “from $20.00 per million tokens in November 2022 to just $0.07 per million tokens by October 2024,” more than a 280-fold reduction in about 18 months.[3] Capability up, price down.

But the same data carries a caution. METR’s models succeed nearly all the time on tasks under about four minutes of human work, yet succeed less than 10% of the time on tasks taking more than around four hours.[1] Long, messy work is still where they break. And the reasoning limits found by Apple’s 2025 study suggest a ceiling that more compute alone may not lift, since the models there got worse, not better, past a certain difficulty.[4]

So the real debate is which signal wins. One camp reads the doubling curve and the price collapse as a runway with years left. Another reads the reliability and reasoning gaps as signs that the current recipe is nearing its limits and that a different idea is needed for the next leap. Both are looking at real numbers. Nobody has settled it, and that uncertainty is itself one of the open questions ahead.

References

  1. Measuring AI Ability to Complete Long Tasks — METR
  2. Time Horizon 1.1 — METR
  3. Artificial Intelligence Index Report 2025, Chapter 1 — Stanford HAI
  4. The Illusion of Thinking — Apple Machine Learning Research (Shojaee et al.)