Is the Next Token All You Need?

Jun 30, 2026

On a Friday evening in June, a model that had been public for three days disappeared.

Anthropic launched Fable 5 on June 9, 2026. It was the first time the company let the public touch its most capable tier, the Mythos class, the line it had previously called too dangerous in the cybersecurity domain to ship. Three days later, at 5:21 p.m. Eastern, a letter arrived from the Commerce Secretary. By that evening Fable 5 was gone. Not throttled, not geofenced. Gone, worldwide, for every customer. The less restricted sibling, Mythos 5, the version only a vetted set of partners could use, got pulled along with it. The stated trigger was a reported way to jailbreak the model, a trick that amounted to asking it to read a codebase and find the flaws.

Almost immediately, people read this as the smoking gun. The models have gotten so strong the government is yanking them off the shelf, the argument went, and behind the lab firewall those same systems are quietly rewriting themselves into something far past human. The shutdown was proof. The takeoff had started.

Follow that idea for a second, because it falls apart in your hands. The government did not lock Fable in a vault and let it cook. It cut its power and shoved it out of reach. The version the public never had, the one reserved for trusted partners, got killed in the same stroke. If frontier labs were sitting on a self-improving superintelligence, the Fable episode would be strange evidence for it, because the machine here was a kill switch, not a greenhouse.

That gap, between what the shutdown felt like and what it was, is the whole subject. The raw facts of 2026 are more dramatic than most people outside the labs realize. The story laid on top of them usually runs a step or two past where the facts can carry you. The interesting work is finding exactly where the facts stop and the story begins, because that line is where the money and the risk both live.

So let me take the question seriously and from both ends. Is predicting the next token enough to get us all the way to superintelligence, the kind that redesigns itself faster than we can watch? And are we, right now, inside the fast and terrifying version of that, the hard takeoff, or the slower one we can still steer?

What a next-token predictor actually does

Strip away the marketing and a large language model does one small thing, over and over. Given a string of text, it guesses the next chunk. Then it adds that chunk and guesses again. Training it means showing it a huge pile of human writing and nudging it, billions of times, to make that guess less wrong. That is the whole objective. Everything else, the essays, the code, the legal analysis, falls out of getting very good at that one guess.

The skeptics’ oldest line is that this can only ever be mimicry. A system trained to predict words is matching surface patterns in text, and patterns in text are not the world. A house cat understands gravity. It plans a jump, predicts where a falling object will land, models cause and effect, and it has read exactly zero words. A language model can write you a flawless paragraph about gravity and could not catch a ball. Yann LeCun, who ran AI at Meta until he left in late 2025 to build a company around a different design, has made this case as bluntly as anyone. He thinks autoregressive models, the technical name for these next-token machines, are a path that climbs impressively and then dead-ends short of the summit. His proposed replacement learns by predicting abstract states of the world rather than words, and his bet is funded to the tune of about a billion dollars, which is a useful reminder that the smartest skeptic in the room is not a crank.

Here is the catch that keeps the skeptics from closing the case. When researchers trained a small model only to predict the next legal move in the board game Othello, and then went looking inside it, they found something it was never told to build: a representation of the board. The model had no eyes and no rules. It saw only strings of moves. To predict the next move well, it had reconstructed the thing the moves were about. You can probe that internal board, flip a piece in the model’s “mind,” and watch its predictions change accordingly. That result, and a stack of others like it, is why the cleanest version of the mimicry argument fails. To predict well enough, the system is pressured to understand. Ilya Sutskever, who has as much claim as anyone to having seen this from the inside, has made the same point in plain terms: predicting the next token well means understanding the reality that produced it.

That is the deep reason the next-token bet is not obviously stupid. Compression is comprehension. If you can predict something, you have modeled it. My confidence that these systems build real internal models of the things they discuss is high. My confidence that text-only prediction builds a model rich enough for full general intelligence is much lower, and that is the crack LeCun keeps his thumb in.

The thing the skeptics got wrong, and the thing they got right

For a few years the recipe was simple. Make the model bigger, feed it more text, give it more computers, and it got better on a smooth and almost embarrassingly predictable curve. People called these the scaling laws, and they held across several jumps in size. That era is closing.

OpenAI’s big 2025 pretraining run, the one that shipped as GPT-4.5, was the tell. It cost an enormous amount and delivered a much smaller jump than the prior generation had. Sutskever said the quiet thing at a December 2024 talk: pretraining as we have known it will end, because data is the fuel and we have only one internet. The supply of high-quality human text is finite, and the largest runs are already drinking from the bottom of the glass. The skeptics who said “the scaling wall is real” were right.

What they missed is where the road turned. The labs stopped trying to win only by making the model bigger and started spending more at the moment of use. Instead of answering instantly, the newest models think first. They write out a long internal chain of steps, check their own work, notice a wrong turn, back up, and try again before they answer. OpenAI’s o-series and DeepSeek’s R1 are built this way, trained by reinforcement learning, which means the model gets rewarded for reasoning that reaches correct answers and learns to do more of it. This is still a next-token predictor. It is the same engine, now allowed to talk to itself on the way to a reply, and rewarded for talking itself into better answers.

Notice what that resembles. When a model writes out its reasoning, evaluates it, catches its own error, and corrects course, it is doing a crude version of the loop people point to when they say humans are more than pattern matchers. We deliberate. We hold a thought, inspect it, argue with it, and revise. The reasoning models externalize that loop onto the page in tokens. It is the difference between a streetballer who reacts and a point guard who reads the defense, runs two options in his head, and picks the better one before he moves. Whether the model’s version is genuine reasoning or fast retrieval dressed as reasoning is a real fight, and the honest answer is that it is some of both, in proportions nobody can yet measure. Moderate confidence that the reasoning is partly real, and low confidence on how far it generalizes past the patterns it was trained on.

This matters for the central question because it changes what “all you need” means. If you had asked in 2024 whether scaling the next-token predictor was enough, the honest answer was becoming no. The pretraining curve was bending. But the paradigm did not die. It grew a second engine, the thinking-at-inference engine, and that engine is young and its own curve has not bent yet. So the question is not settled by the pretraining slowdown the way the skeptics hoped. The bet just moved to a different table.

The strongest card the bulls hold

Here is where the facts get genuinely hard to wave away, and where I had to verify every number before I’d repeat it, because the loose versions floating around are inflated.

In May 2026, Anthropic published its own internal data on how much of its work the AI now does. More than 80% of the code merged into Anthropic’s own codebase was written by Claude. Before its coding agent launched in early 2025, that figure sat in the low single digits. The company’s leadership puts the looser number, counting scripts and throwaway code, north of 90%. The output per engineer tells the same story: in the second quarter of 2026 a typical Anthropic engineer was shipping roughly eight times the code per day they shipped in 2024. One engineer, the company reports, had not written a line by hand in five months.

The capability behind that is climbing on a curve worth staring at. The length of task a model can finish on its own, with no human stepping in, has been doubling about every four months, up from every seven. In March 2024 the best model could handle a software task that takes a person about four minutes. A year later, about ninety minutes. By 2026, twelve-hour tasks. A research preview, the unreleased Mythos model, ran for at least sixteen hours on its own, which is the edge of what the outside evaluators could even measure. On one optimization problem, that preview found a 52x speedup where a strong human researcher typically gets about 4x in a half-day of work.

And the AI is starting to improve the AI. Google’s AlphaEvolve, a system that uses models to search for better algorithms, found a way to multiply a certain class of matrices using 48 multiplications instead of 49. That sounds trivial until you learn the previous record stood since 1969, for 56 years, and that the result is mathematically verifiable, not a vibe. The same system claws back about 0.7% of Google’s entire worldwide computing fleet by scheduling it better, and sped up a key training routine enough to cut roughly 1% off the time to train Google’s flagship model. A model, helping train its successor. That is the loop everyone is watching for, observed in a small but real form.

If you wanted to make the case that we are in or near a hard takeoff, this is the case. The thing improves. The thing now helps build the next thing. The task horizon is doubling fast. The forecasters who called this are not all cranks either. Leopold Aschenbrenner, who wrote a widely read 2024 essay arguing the trendlines pointed to roughly human-level AI by 2027 and then a fast intelligence explosion, got the infrastructure story very right; the trillion-dollar buildout he predicted is happening, and he now runs a hedge fund betting on the picks-and-shovels of it. Ray Kurzweil has held to AI matching humans by 2029 and a full singularity around 2045, and his 2029 date, once fringe, now sits inside the range serious lab leaders give. The “AI 2027” scenario laid out a month-by-month path through an automated-research explosion that, read in 2026, does not feel like science fiction.

The thing the bulls keep getting wrong

Now turn the same facts over and look at the underside.

Start with that 80% figure, because it is the one people quote most and understand least. It is Anthropic’s number for Anthropic’s codebase, not an industry truth, and writing code is the single task these models are best at, the home court. Software has a property most work lacks: you can check the answer automatically. The code runs or it doesn’t, the test passes or it doesn’t, and that clean signal is exactly what reinforcement learning needs to train on. The horizon doubling every four months is measured mostly on coding and technical tasks for the same reason. None of this tells you the model is as far along at the messy, unverifiable work that fills most of the economy, the negotiation, the judgment call with no test suite, the decision about what is even worth doing.

That last one is the wall the labs keep hitting, and to their credit they say so. Anthropic’s own paper, the one with the 80% number, is built around the idea of AI building itself, and its own researchers write that recursive self-improvement “is not here, nor is it inevitable.” Their most likely scenario is not the runaway. It is a world where the AI does more and more of the doing while humans keep setting the direction. The bottleneck they name is research taste, the senior-level judgment about which problem to chase and which result to trust. In late 2025 the model picked a better next research step than the human about half the time; months later, closer to two-thirds. Climbing, clearly. Closed, no.

It is worth being skeptical even of the impressive anecdotes. The paper’s showcase example, a model that diagnosed a nasty production incident in two hours that would have taken a person days, is a real and useful thing. It is also, when you read it closely, classic debugging: a clear problem, rich error data, a fix to be found. A model finding an obscure flag faster than a tired human is the compiler catching your typo, scaled up. It is enormously valuable. It is not the same as the model deciding, unprompted, what the company should build next quarter, which is the capability that would actually close the loop.

Then there is generalization, the soft spot under all of it. There is a test called ARC-AGI, built specifically to be easy for humans and hard for memorized knowledge. Its second version, released in 2025, knocked frontier models down to near zero at launch while ordinary people solved the puzzles without much trouble. Scores have since climbed, at high cost, which tells you the wall is scalable but not free. Apple put out a paper showing that reasoning models, pushed past a certain complexity, do not degrade gracefully; they collapse, and stranger still, they sometimes try less as the problem gets harder. The rebuttals were fair, some of Apple’s puzzles were rigged in ways that guaranteed failure, but the core point survived the fight. These systems have a frontier of difficulty past which the reasoning stops being reasoning, and we do not know how to push that frontier reliably with scale alone. Moderate-to-high confidence that this limit is real; moderate confidence on how binding it stays as the inference-time engine matures.

Now back to the opening, because the causal story behind the hard-takeoff thesis is where it breaks hardest. The claim is that government is pulling the best models off the market, which lets the labs keep those models internal, which means recursive self-improvement is happening behind the firewall, out of view. The June 2 executive order is voluntary; it explicitly does not create a licensing or pre-clearance requirement, and the 30-day window it describes gives the government an early look, not the lab a private runway. The Fable shutdown removed the model from everyone, including the labs’ own foreign-national staff. When the government restricted OpenAI’s GPT-5.6 later that month to about twenty approved partners, OpenAI pushed back in public, saying this kind of gating “should not be the long-term default.” The labs are fighting to release these models, not hoarding them to self-improve in the dark. The internal-versus-public capability gap is real, and it is mostly explained by dull things: safety testing, the cost of serving a model to hundreds of millions of people, and the obvious competitive logic of not handing rivals your sharpest research accelerant. A gap that boring is not evidence of a secret intelligence explosion. It is evidence of caution and economics.

And the physical world is slower than the digital one by a margin that bounds how fast any of this can hit the economy. Roughly half the AI data centers planned for 2026 in the United States have slipped or been canceled. Transformers are back-ordered, the power grid is strained, and only about a third of the new capacity people projected is actually under construction. You can have an algorithmic breakthrough on a Tuesday. You cannot conjure a gigawatt of power and the steel to use it on a Wednesday. Even Aschenbrenner’s own thesis leans on this; his hedge-fund bet is less about clever code and more about electrons, because electrons are the constraint that clever code runs into.

The forecasters are slipping accordingly. The “AI 2027” authors have quietly walked their median toward 2029 and 2030. Aschenbrenner’s revenue prediction for mid-2026 came in well under his line, and his call that open-source models would fade was flatly wrong; the cheap open models from China are sitting right behind the frontier and forcing prices down. Being early and being wrong are different, and these are early. But early enough that anyone underwriting against 2027 as a date should stop.

So which is it

Both pictures are true, which is why smart people keep talking past each other. The capability is compounding faster than the skeptics admit. The runaway is further off than the bulls claim. The shape that fits the evidence is a fast soft takeoff: a steep, accelerating ramp where the AI gets dramatically more capable and starts meaningfully speeding up its own development, while humans stay in the loop at the points that matter and the physical world throttles how fast any of it reaches the ground. The loop is real and it is open. Closing it requires automating the senior judgment that the labs themselves admit they have not automated, and clearing a generalization wall we do not know how to clear on command.

My honest probabilities, held loosely: fast soft takeoff as the base case, the most likely world by a comfortable margin. A genuine hard takeoff, the weeks-to-months runaway, as a real tail, somewhere in the rough range of one in seven to one in four this decade, mostly through the automated-research channel if that research-taste gap closes faster than the physical constraints bite. Not negligible. Not the base case. Anyone who tells you they know which of these we are in with confidence is selling something, possibly a fund.

On the title question, then. Is the next token all you need? For an AI that is superhuman across most of what can be checked and verified, probably yes, and we are most of the way there. For full self-improving superintelligence, unproven, and the honest word is unproven rather than no, because the next-token engine keeps doing things its critics swore it couldn’t, and because the inference-time reasoning engine bolted onto it is too young to have shown its ceiling.

One more thing, on the seductive line that humans are just next-token predictors too. There is real science under it. A leading theory in neuroscience holds that the brain is fundamentally a prediction machine, constantly guessing its next sensory input and learning from the error. When you read this sentence, your brain is predicting the word before your eyes reach it. The rhyme with a language model is not an accident. But the brain predicts a flood of sound and sight and touch and its own movement, grounded in a body, on about twenty watts, learning continuously as it goes. It was not trained by gradient descent on the internet. Calling a human a next-token predictor is a metaphor that illuminates one shared trick and hides a dozen differences that might be the whole game. Use it to understand why the bet is plausible. Do not mistake it for a proof that the bet pays.

What this means if you allocate capital

Switch tracks now, from what is true to what to do about it, because the two get jumbled and both suffer for it.

The money has already voted, hard. In the first quarter of 2026, global venture funding hit about $300 billion, and roughly 80% of it, around $240 billion, went to AI. Four rounds, OpenAI at $122 billion, Anthropic at $30 billion, xAI at $20 billion, and Waymo at $16 billion, took nearly two-thirds of all venture dollars on Earth. A single quarter of AI funding exceeded all of 2025. That is not a sector. That is a gravity well.

Here is the part most people get backwards, and it is the same backwards as the Fable story. A soft takeoff is the harder world to invest in, not the easier one. If a hard takeoff were imminent, almost nothing you funded at the application layer would matter, because a superintelligence would eat every workflow at once and the only sane bets would be the compute and the power underneath it. The soft takeoff is more demanding precisely because the technology keeps getting better and cheaper underneath every company you back, on a schedule, for years. The model that makes your portfolio company magic this year is the commodity that makes it ordinary next year. The cost of a million tokens fell about 80% from 2023 to 2025. Any business whose only edge was reselling access to a model has already watched its margin evaporate.

So the test experienced investors are starting to apply is brutal and simple. Two questions, and a company needs a real answer to both:

If a frontier lab shipped your exact product as a default feature in their next release, would your customers cancel?
Does what you do survive three more model generations getting cheaper and smarter?

Most thin wrappers around someone else’s model fail both. What passes tends to own something the model maker cannot reach from a data center. Proprietary data that compounds the more the product gets used. A workflow so embedded in how a company runs that ripping it out costs more than tolerating it, which is why coding tools, legal-research tools, and enterprise-search tools that became the system of record have held up while generic chat skins have not. Trust and compliance in regulated corners, healthcare, finance, defense, where being right and being accountable matter more than being clever. And the physical world, robots and the machinery of moving atoms, where the bottleneck is not tokens and the frontier labs have no special advantage.

The two layers worth real money look opposite and are both defensible. One end is the picks and shovels: compute, power, memory, the boring infrastructure the whole boom runs on, which pays off in almost every scenario including the scary one. The other end is the deep vertical application that owns its data and its customer so completely that a better base model helps it rather than kills it. The undifferentiated middle, the layer that is just a prompt and a logo on top of an API, is where capital goes to die, and it is where a frightening amount of 2026 seed money is going anyway.

For the people who fund the funds, the limited partners, the uncomfortable truth is that AI is your largest exposure whether you chose it or not, through the venture portfolios, through the public indices where the ten biggest companies now make up a share of the market that exceeds the peak before the 2000 crash. The question is not whether to be exposed. It is whether the exposure is concentrated in things that survive a soft takeoff, where defensibility and timing decide everything, or scattered across things that the next model release quietly deletes. The barbell, infrastructure on one end and defensible verticals on the other, with as little as possible in the middle, is the shape that respects both the upside and the wall.

And the macro claim you will hear at the top of every keynote, that AI and robots will multiply the global economy tenfold in a decade, the Musk number: low confidence, and I would bet against it on that timeline. The serious range from people who model growth for a living runs from a rounding error to something genuinely large, several percentage points of added output over a decade in the credible middle, with explosive growth as a real but later and far less certain tail. A tenfold jump in ten years requires the hard-takeoff world and the physical buildout to both arrive on schedule, and the physical buildout is already slipping. Plan for a serious productivity boom. Do not underwrite a miracle.

The shutdown on that Friday in June is the whole thing in miniature. A model good enough to scare a government (encouraged by fear based marketing by the labs), killed by a letter, pushed out of reach instead of locked away to grow. Powerful and constrained at the same time, the capability sprinting while the institutions and the power grid jog to keep up. That is the texture of a soft takeoff. It is less cinematic than the runaway, and harder to live inside, because it asks you to keep making good decisions year after year while the ground moves under you, instead of betting everything once on a single discontinuity. The next token might well be most of what you need. It is not, yet, all of what it would take to stop needing us. The interesting years are the ones in between, and we are in them.

Ignite Insights

Discussion about this post

Ready for more?