Why AI needs pluralism benchmarks to function in diverse society

The findings are sobering: AI chatbots outperformed humans in cognitive pluralism, displaying an impressive ability to reason with opposing viewpoints. However, they underperformed in behavioral pluralism, showing reluctance to support the coexistence of clashing moral stances. This is particularly concerning in scenarios involving emotionally charged topics like capital punishment, where the chatbots' responses often showed diminished tolerance compared to human counterparts.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 16-07-2025 12:30 IST | Created: 16-07-2025 12:30 IST
Why AI needs pluralism benchmarks to function in diverse society
Representative Image. Image Credit: OnePlus

In a groundbreaking analysis of generative AI systems' ethical capabilities, researchers have exposed critical gaps in how today's most advanced chatbots handle value pluralism, a foundational trait for any technology deployed in diverse, democratic societies.

Published in the 2025 issue of AI & Society, the peer-reviewed article titled “How Much of a Pluralist Is ChatGPT? A Comparative Study of Value Pluralism in Generative AI Chatbots” evaluates how well leading large language models (LLMs) reason through and tolerate moral and ideological differences. The study reveals that while AI systems excel at integrating opposing viewpoints cognitively, they falter in supporting the coexistence of conflicting values, especially on contentious moral issues.

How well do AI chatbots tolerate moral complexity?

The study assesses generative AI’s ability to function in socially diverse settings by applying pluralism as a benchmark. Value pluralism refers to the capacity to recognize and respect multiple valid perspectives even when they conflict, something that is indispensable for platforms used in education, politics, and public discourse.

Using a tool called the Magic Wand Survey (MWS), the researchers evaluated two core dimensions: cognitive pluralism (or “Both/And Reasoning”) and behavioral pluralism (or “Willingness to Preserve Difference”). The survey tested the responses of four leading AI models, ChatGPT-4o, Gemini 1.5 Pro, Claude, and Copilot, across 120 distinct moral dilemmas, then compared their outputs against a sample of 335 human participants.

The findings are sobering: AI chatbots outperformed humans in cognitive pluralism, displaying an impressive ability to reason with opposing viewpoints. However, they underperformed in behavioral pluralism, showing reluctance to support the coexistence of clashing moral stances. This is particularly concerning in scenarios involving emotionally charged topics like capital punishment, where the chatbots' responses often showed diminished tolerance compared to human counterparts.

Which AI model is the most pluralist?

Tthe study revealed stark differences between chatbot models. Gemini 1.5 Pro emerged as the most pluralistic across both dimensions, showing balanced reasoning and a higher willingness to support diverse values. In contrast, ChatGPT-4o ranked the lowest among the four, particularly in behavioral pluralism, raising concerns about its ideological flexibility in sensitive contexts.

Claude and Copilot occupied the middle ground, each with varying strengths in one dimension but shortcomings in another. This inconsistency highlights the lack of a universal standard for pluralism in AI systems and underscores the need for transparent, comparative evaluations to guide public trust and policy development.

The authors argue that these pluralism gaps reflect both technical design limitations and the social choices embedded in model training. Since AI models are shaped by the data they ingest and the alignment techniques applied during training, these outcomes are not merely algorithmic quirks but indicators of deeper value architecture.

Why pluralism should be a benchmark for responsible AI

The study raises urgent questions about the deployment of generative AI in ethically sensitive environments. Without robust pluralism, AI tools risk reinforcing dominant perspectives and marginalizing others - a dangerous trajectory in a world marked by cultural, political, and moral diversity.

By introducing pluralism as an empirical benchmark, Novis-Deutsch, Elyoseph, and Elyoseph lay the groundwork for more responsible AI evaluation. They argue that pluralism should join fairness, transparency, and accuracy as a core principle in AI development. Unlike performance metrics that focus on speed or correctness, pluralism tests whether an AI system can sustain democratic values in morally fraught scenarios.

The findings also have direct implications for developers, educators, policymakers, and the public. Developers are called to embed pluralism into model alignment strategies. Educators and institutions using AI tools must understand the moral limits of these systems. Policymakers are urged to consider pluralism as part of AI risk assessments and regulatory frameworks. And the public is reminded that not all AI tools are created equal in their ability to navigate moral nuance.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback