- Good Bot
- Posts
- Why AGI Progress Has Stalled + Introducing ARC-AGI
Why AGI Progress Has Stalled + Introducing ARC-AGI
While LLM architecture has advanced rapidly, true AGI remains elusive. The ARC-AGI challenge pushes AI beyond its current limits in reasoning and generalization.

The Road to AGI — Why We’re Still Not There Yet
We’ve built AI models that can create hyper-realistic paintings, write stand-up comedy routines, and even code entire websites. But when it comes to developing a system that can genuinely think like a human? Well, we’re still waiting. And at this rate, it seems like we might be waiting for a while.
Large language models (LLMs) like OpenAI’s ChatGPT, Google’s Gemini, Meta’s LLaMA, and Anthropic’s Claude have made monumental leaps in natural language processing (NLP) and machine learning (ML). Compared to earlier ML models, today’s LLMs offer far more flexibility and capability.
However, despite these incredible advancements, AGI (Artificial General Intelligence) is still out of reach. Even with ambitious goals like OpenAI’s plan to achieve AGI by 2030, many experts remain skeptical about how soon we can realistically expect a system that rivals human-level reasoning.
What Exactly Is AGI? And Why Aren’t LLMs on Pace to Achieve It?
Artificial General Intelligence (AGI) refers to a machine’s ability to understand, learn, and apply intelligence across a wide range of tasks, matching or exceeding human reasoning.
But can’t LLMs already reason and solve problems?
Actually, no.
While LLMs can produce highly coherent responses, they don’t think like humans. Instead, they rely on a process called tokenization, where language is broken down into smaller parts (called tokens), which are then analyzed based on patterns from the data the models were trained on. When LLMs generate responses, they are essentially predicting the next token in a sequence, not applying any form of true reasoning or understanding.
As François Chollet, a prominent AI researcher and creator of the Keras library (an open-source software framework for deep learning), puts it: LLMs "memorize" rather than understand.
This is the main limitation of LLMs: their heavy reliance on the data they’ve been trained on. While they excel within the scope of that data, they struggle with tasks that require deeper abstract thinking, reasoning, or real-world knowledge that wasn’t directly embedded into their training.
As a result, many AI experts argue that the current trajectory of LLMs is more of a detour than a pathway to AGI. In more blunt words, AGI progress has stalled.
Enter the ARC-AGI Challenge
In his influential paper, "On the Measure of Intelligence," Chollet introduces the Abstraction and Reasoning Corpus (ARC), arguing that traditional AI benchmarks are insufficient for assessing true intelligence and instead proposes a new framework that focuses on a model’s ability to adapt and generalize without familiar environmental context.
This is where the ARC-AGI challenge comes in. Created by Chollet and Mike Knoop, co-founder of workflow automation software Zapier, the ARC-AGI challenge is an open-source competition that presents prospective AI models with a variety of tasks designed to test their problem-solving and reasoning capabilities, specifically measuring how well a given model can generalize beyond its training—an essential trait of true AGI.

Example problem from the ARC-AGI public dataset

Another example problem from the ARC-AGI public dataset
To participate, teams must access the ARC-AGI training and testing datasets, both of which are publicly available. However, submissions must be able to run offline—AI models competing in this challenge won’t have access to the internet during evaluation, making it a true test of their standalone capabilities.
Are There Prizes?
Yes, there are!
The challenge offers a $1 million prize pool, with a $500,000 grand prize divided among the top five teams that score at least 85% on the ARC-AGI benchmark. Additionally, a $50,000 prize will be awarded to the team that publishes the best research paper.
New Directions
The goal of the ARC-AGI challenge is to broaden AGI research and explore new paths beyond LLMs.
Currently, there is a monoculture in AI research, but the ARC-AGI challenge has the potential to diversify the machine learning techniques that are currently applied to AGI development.
However, it is important to note that solving the ARC-AGI benchmark and achieving an 85% raw accuracy does not necessarily mean a team has “reached AGI.” It is entirely possible that an AI model performing well on the ARC-AGI benchmark could perform poorly on another, even similar, test.
Regardless, we must walk before we run. By initiating the conversation and incentivizing researchers to participate, the ARC-AGI challenge is fostering innovation and progress, inspiring the public to explore new avenues in AGI beyond LLMs.