Anthropic Abandons Key Safety Commitment
In a significant shift, Anthropic has abandoned a key safety commitment that previously defined its AI development strategy. Initially, the company, known for its focus on safety measures, pledged in 2023 to avoid training AI systems unless it could guarantee effective safety protocols in advance. This commitment formed the core of its Responsible Scaling Policy (RSP). However, company leaders have now decided to revamp this policy, leading to the removal of the promise to refrain from releasing AI models without ensuring appropriate risk mitigations.
Changing Priorities Amidst AI Advancements
Chief Science Officer Jared Kaplan articulated that the rapid evolution of AI necessitated a change in strategy. He pointed out that halting model training would serve no beneficial purpose while competitors advanced rapidly in AI development. “We felt that it wouldn’t actually help anyone for us to stop training AI models,” Kaplan stated. The revised policy emphasizes transparency regarding safety risks and includes new commitments to exceed competitor safety measures.
New Safety Measures and Reporting
- The updated policy promotes transparency about safety risks.
- Anthropic commits to matching or surpassing competitor safety efforts.
- A promise to delay AI development based on perceived risks is now included.
Despite these adjustments, the new RSP allows for greater flexibility in model training, diverging from earlier strict limitations. This shift comes as Anthropic enjoys recent successes, including a remarkable $30 billion investment in February and the growth of its annualized revenue by tenfold. The company’s direct business model has been viewed favorably compared to competitors.
Response to Competition and Regulatory Challenges
Kaplan denied that these changes represented a capitulation to market pressures, framing them instead as a necessary adjustment to evolving scientific and political landscapes. The initial intent of the RSP was to inspire rival companies to adopt similar safety measures; however, the anticipated regulatory frameworks have not materialized. Current political attitudes towards AI development, particularly under the Trump Administration, have favored a rapid advancement approach.
Concerns Over AI Risks
The complexities of AI evaluations have also contributed to the decision to modify the RSP. By 2025, Anthropic acknowledged that its models could not be ruled out as potential facilitators of serious risks. Chris Painter, a policy director at METR, noted that this policy change indicates a reactive approach to growing AI capabilities, which could lead to unpredictable risks.
Commitment to Safety Roadmaps
Despite the relaxation of previous constraints, Anthropic continues to emphasize safety through regular “Frontier Safety Roadmaps” and “Risk Reports”. These documents are intended to outline safety goals and provide detailed risk assessments. Although some experts express concerns about the effectiveness of the new measures, Kaplan insists that the company will maintain its focus on AI safety and innovation.
Anthropic aims to balance its competitive aspirations with a renewed commitment to developing AI safely, reiterating its dedication to transparency and accountability in a rapidly evolving industry.