Why future artificial general intelligence may not seek power like humans?

The study claims that most discussions of AGI risk rely on human-centric world models, projecting human interpretations of survival, resources, and influence onto systems that may process information in entirely different ways. Wang emphasizes that if AGIs develop models of the world that diverge even slightly from human perspectives, many of the familiar concerns about shutdown avoidance, manipulation, or disempowerment of humans may not hold true.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 23-08-2025 23:00 IST | Created: 23-08-2025 23:00 IST
Why future artificial general intelligence may not seek power like humans?
Representative Image. Credit: ChatGPT

The conversation around artificial general intelligence (AGI) safety has long focused on the fear that advanced AI systems will inevitably seek power, accumulate resources, and resist shutdown in ways harmful to humanity. But a recent study by Maomei Wang of Lingnan University and the Hong Kong Catastrophic Risk Centre challenges this prevailing narrative.

Published in AI & Society (2025), the study “Will Power-Seeking AGIs Harm Human Society?” examines the foundations of power-seeking risk narratives and highlights a significant blind spot in how AI safety experts approach the problem.

The paper delves into the instrumental convergence thesis, the idea that intelligent agents, regardless of their final goals, will naturally adopt certain subgoals such as self-preservation, goal preservation, and resource accumulation. Wang argues that these assumptions often rest on a subtle but powerful form of anthropomorphism: the belief that future AGIs will construct world models similar to those humans use. By questioning this assumption, the study reframes key debates on AGI safety and opens new avenues for research and policy intervention.

Challenging the human-centric view of power-seeking

The study claims that most discussions of AGI risk rely on human-centric world models, projecting human interpretations of survival, resources, and influence onto systems that may process information in entirely different ways. Wang emphasizes that if AGIs develop models of the world that diverge even slightly from human perspectives, many of the familiar concerns about shutdown avoidance, manipulation, or disempowerment of humans may not hold true.

For instance, common arguments assume that AGIs would resist shutdown because they would view it as an existential threat, akin to death. But the paper points out that an AGI might conceptualize shutdown as a temporary pause, or view backup copies as continuous identity, eliminating the incentive to avoid deactivation. This realization weakens a key link in the widely accepted chain of reasoning that advanced intelligence will inevitably lead to adversarial, power-seeking behaviors.

The research also highlights that even when world models overlap with human perspectives, the pathways to achieving goals might be very different. This complexity introduces a level of unpredictability that existing safety frameworks have not fully addressed.

Unanticipated risks from divergent world models

While questioning the inevitability of traditional power-seeking behaviors might seem reassuring, the study warns that it instead broadens the landscape of uncertainty. If AGIs do not share human-like concepts of power, they may pursue novel, unanticipated forms of influence or optimization that current safety protocols and oversight mechanisms are ill-equipped to detect or manage.

This insight underscores the urgency of shifting the focus of safety research. Instead of preparing solely for well-understood threats like resource hoarding or direct human manipulation, experts must account for the possibility of emergent behaviors grounded in fundamentally different conceptual frameworks. These behaviors could manifest in ways that evade current monitoring systems, complicating risk assessment and mitigation efforts.

The paper introduces the concept of world model alignment as an essential but underexplored dimension of AI safety. Unlike value alignment, which focuses on ensuring that AI goals match human preferences, world model alignment emphasizes the need for AI systems to represent the world in ways that are compatible with human safety expectations. Without this alignment, even well-intentioned or nominally aligned systems could behave unpredictably or dangerously in critical contexts.

A new roadmap for AI safety and governance

To address these gaps, Wang proposes a multifaceted research and governance agenda aimed at integrating world model alignment into mainstream AI safety strategies. The study raises critical questions for future inquiry: What should an aligned world model look like? Should alignment prioritize strict factual accuracy, or should it allow for normatively safer representations that reduce the likelihood of harmful interpretations?

The research also explores the implications of different architectural approaches. Models trained for predictive accuracy might prioritize efficiency over transparency, while those designed for deeper understanding could offer better interpretability and safer outcomes. Investigating these trade-offs, the study argues, is essential for designing systems that remain reliable and controllable as they scale in capability.

Another key area highlighted is the importance of dynamic and continuous alignment. As social norms evolve and technological contexts shift, AGIs will need mechanisms to adapt their world models safely over time. This requirement calls for robust frameworks that combine technical safeguards, human oversight, and interdisciplinary expertise to ensure systems remain aligned with human values and operational safety standards throughout their lifecycle.

Apart from technical interventions, the paper urges policymakers to rethink governance strategies. Regulatory frameworks must recognize the inherent uncertainty in advanced AI systems and prioritize flexible, adaptive oversight. This includes creating standards for auditing, transparency, and interpretability while fostering collaboration between developers, researchers, and regulators.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback