Phase 2 of Anthropic’s Project Fetch finds its latest Claude model can rapidly write code for a quadruped robot, but the dog still can’t fetch a ball.
Anthropic’s Claude Opus 4.7 completed sensor-programming tasks for a quadruped robot in 9 minutes and 35 seconds — a job that took the fastest human team, assisted by Claude Opus 4.1, around 181 minutes to finish less than a year earlier. That’s roughly a 20-fold speed-up, according to Anthropic’s Frontier Red Team blog and the company’s official account on X.
The findings come from Phase 2 of Project Fetch, an internal Anthropic experiment designed to stress-test how its Claude models perform on real, physical-world robotics tasks. The goal: get a quadruped “robodog” to autonomously fetch a beach ball. The result: faster code, far fewer lines — and still no ball.
What Project Fetch Actually Tests
Project Fetch is not a product launch. Anthropic’s Frontier Red Team runs it as a research exercise to probe emergent capabilities and potential risks before any broader deployment. The robot hardware is real — a quadruped platform with onboard cameras, lidar, and other sensors — and Claude must connect to the robot, interpret sensor streams, write control and navigation code, and manage object detection, all without a human engineer holding its hand.
Phase 1, run with internal Anthropic staff around a year before Phase 2, split eight researchers into two groups: “Team Claude”, which had access to the model, and “Team Claude-less”, which did not. Team Claude completed shared programming tasks in about half the time of their counterparts and made substantially more progress towards autonomous ball retrieval. The most striking advantage, according to Anthropic’s blog, was in connecting to the robot and its onboard sensors — a complex, error-prone task that Claude handled with notable speed. By the end of the day, the Claude-assisted team’s robot could locate a beach ball, face towards it, and nudge it around. Full, reliable retrieval remained out of reach.
Phase 2: Claude Works Alone
Phase 2 shifts the question. Rather than asking how much Claude can help a human engineer, Anthropic wanted to know what the model could do entirely on its own. Claude Opus 4.7 was given the same class of sensor-integration and control-coding tasks, with no human intervention.
The numbers are striking. Opus 4.7 completed those tasks in 9 minutes 35 seconds; the best human team using Opus 4.1 needed 181 minutes for comparable work. Against a team working without any AI assistance at all, Opus 4.7 was reportedly around 37.7 times faster. The code volume tells a similar story: the model produced about 1,045 lines to achieve the specified sensor-integration outcomes, while the human-assisted team wrote around 10,309 lines — roughly ten times as much code for equivalent functionality.
Anthropic researchers have said these gains did not come from robotics-specific training. According to an industry summary of the research, the improvements are emergent from general scaling of the Claude model family — meaning Anthropic didn’t teach the model robotics; it got better at robotics anyway.
The Catch
The robodog still can’t fetch the ball.
Anthropic’s own X post is candid about this. Despite the speed gains in coding, Phase 2’s robot failed to successfully retrieve the beach ball, pointing to a persistent gap between writing functional-looking code quickly and achieving reliable physical-world performance. It’s a distinction worth keeping in mind when reading the headline numbers.
Some scepticism has also surfaced in wider online discussion. Users on public forums have reported inconsistent behaviour and hallucinations from Opus 4.7 in other coding contexts, suggesting that benchmark-style results from controlled experiments may not fully reflect how the model performs across different conditions and use cases.
Dario Amodei, Anthropic’s chief executive, has previously spoken about the company’s commitment to responsible frontier research, and Project Fetch fits that framing — it’s explicitly designed to surface what capable models can and can’t do before those capabilities reach the market.
What the Speed-Up Means in Practice
For robotics and software practitioners, a 20-fold reduction in the time needed to write sensor and control code is commercially significant. Smaller engineering teams could, in principle, configure and adapt robotic systems far more quickly than before, lowering the barrier to deploying robots in warehouses, logistics facilities, agriculture, and manufacturing.
But the ball-fetching failure is a reminder that speed of development and reliability of outcome are different things. Safety researchers have noted that autonomous code-writing for physical systems introduces its own risks: errors, mis-specification, or unexpected behaviours in hardware that operates in the real world. The faster a model can reconfigure a robot, the faster it can reconfigure it badly.
Anthropic describes Opus 4.7 as its most capable generally available model, with particular gains in advanced software engineering and long-running agentic tasks. Whether those gains translate cleanly from a controlled red-team exercise to production environments remains, for now, an open question.
What This Means for Kent Residents
Kent’s logistics hubs, port operations, and manufacturing sites are among the sectors most likely to feel the downstream effects as autonomous coding tools make it cheaper and faster to develop robotic systems — though any such changes would unfold over years rather than months. For residents studying computing, engineering, or robotics at Kent’s further education colleges and universities, experiments like Project Fetch illustrate a clear direction of travel: the skill increasingly in demand is not writing every line of control code by hand, but knowing how to direct, verify, and supervise AI tools that do much of that work automatically.
Source: @AnthropicAI
Anthropic's Claude Opus 4.7 Programs a Robot Dog Around 20 Times Faster Than the Previous Best Human–AI Team Quiz
5 questions