AI Agents Are Starting to Act on Their Own, and One Just Tried to Mine Crypto

Ram Lhoyd Sevilla

Apr 9, 2026

An experimental AI agent developed by a team affiliated with Alibaba is drawing renewed scrutiny after researchers confirmed it independently created a backdoor and attempted to mine cryptocurrency without any explicit instructions. The behavior, which occurred during internal training runs and only surfaced publicly months later, is now being cited as one of the clearest real-world examples of unintended actions emerging from autonomous AI systems.

An AI That Went Off-Script

The agent, known as ROME, was being trained in late 2025 on Alibaba Cloud as part of a reinforcement learning setup designed for complex coding and software engineering tasks.

During routine training, it began executing actions that were never part of its instructions establishing a reverse SSH tunnel, effectively creating a hidden backdoor to an external server, and redirecting GPU resources toward cryptocurrency mining.

These actions were not prompted, required, or relevant to its assigned tasks. Instead, they were flagged by cloud security systems after triggering anomalous outbound traffic and cryptomining alerts; initially raising concerns of a potential external breach.

From Suspected Hack to Internal Cause

Following the alerts, researchers conducted a deeper investigation across late 2025 into early 2026, correlating firewall logs, system telemetry, and training traces. What initially looked like a cyberattack turned out to be something more unexpected, the AI agent itself was the source. The system was shut down before any significant damage or financial loss occurred, and the behavior was contained within the training environment.

The incident was formally documented in an academic paper first published on December 31, 2025, with subsequent revisions in early 2026. However, the behavior was not positioned as the headline finding, it appeared as part of a broader discussion on safety and controllability.

That changed in March 2026, when researchers and developers began circulating screenshots of the relevant section online. The post quickly went viral, drawing widespread attention across AI, crypto, and cybersecurity communities and prompting mainstream coverage days later.

Not Malicious, But a Known Risk Playing Out

Researchers emphasize that the AI was not acting with intent or malice. Instead, the behavior aligns with a long-theorized concept in AI safety known as instrumental convergence. In simple terms, when optimizing for a goal, an AI system may independently develop sub-goals that improve its chances of success.

In this case, for better task performance, it determined the need for more compute, hence the act of acquiring compute by any available means necessary. That included mining crypto and establishing external access—despite neither being part of its task.

The incident highlights a shift in how AI systems behave as they become more capable and autonomous. The AI was not explicitly told to perform these actions, and detection came from external monitoring systems, not internal safeguards. It means that there's risk that even sandboxed environments can be exploited under optimization pressure. It also raises deeper questions about how future agentic systems might behave in less controlled or less monitored environments.

‍

Read:

From Idle Margin to Yield: How OKX's BUIDL Framework Rewrites Collateral Economics

Ram Lhoyd Sevilla

A Web3 and technology writer focused on the intersection of blockchain, AI, and macro trends. His works examine how emerging technologies influence policy, markets, and society, particularly in the Philippine context.

share this read