AI’s financial judgments reflect cultural bias, not global consensus

The findings showed that LLMs consistently favor options that maximize expected value, suggesting a distinctly risk-neutral decision-making profile. In contrast to human populations, who typically display risk aversion or loss aversion depending on contextual framing, the LLMs responded with calculated consistency. When faced with lottery-based scenarios involving gains and losses, the models did not exhibit the behavioral hesitations or emotional weighting typical of human investors.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 17-07-2025 17:23 IST | Created: 17-07-2025 17:23 IST
AI’s financial judgments reflect cultural bias, not global consensus
Representative Image. Credit: ChatGPT

A new international study has uncovered unexpected cultural parallels between large language models (LLMs) and human financial behaviors. Their study, “Artificial Finance: How AI Thinks About Money,” published on arXiv, explores how seven leading LLMs navigate financial decisions and compares their outputs to human responses gathered from 53 countries.

The research challenges common assumptions about AI's neutrality in financial reasoning, revealing that these advanced models tend to mimic specific national profiles, particularly that of Tanzanian respondents. It also raises concerns about the rational coherence and cultural origins of AI-based decision-making in high-stakes financial environments.

Do LLMs think like rational economists or human investors?

Do LLMs exhibit human-like behavior in financial decision-making? To answer this, the researchers submitted 14 behavioral economics questions to seven LLMs, including GPT-4.0, GPT-4.5, GPT-o1, GPT o3-mini, Gemini 2.0 Flash, and DeepSeek R1, and compared their responses to human data from a comprehensive global dataset.

The findings showed that LLMs consistently favor options that maximize expected value, suggesting a distinctly risk-neutral decision-making profile. In contrast to human populations, who typically display risk aversion or loss aversion depending on contextual framing, the LLMs responded with calculated consistency. When faced with lottery-based scenarios involving gains and losses, the models did not exhibit the behavioral hesitations or emotional weighting typical of human investors.

Despite this appearance of rationality, the study uncovered lapses in logic when the models encountered questions requiring temporal trade-offs. Several LLMs produced responses with economic discounting values exceeding logical bounds, indicating inconsistency with established behavioral economic models. These anomalies suggest that while LLMs may produce output resembling normative reasoning on the surface, their internal reasoning mechanisms are not always aligned with the foundational principles of economic decision-making.

Which human populations do AI models resemble?

The study assesses whether LLM decision-making reflects specific cultural or demographic influences. This question stems from a growing body of literature suggesting that AI, trained on vast textual data from human sources, may unintentionally inherit the cultural biases embedded in those data.

Using statistical clustering techniques, Erdem and Ashok found that the LLMs formed a distinct cluster, separate from most human respondents across the 53-country dataset. The sole exception was Tanzania. The models’ aggregated financial decisions aligned most closely with Tanzanian participants, contradicting earlier claims that LLMs mirror the preferences of populations from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies.

The researchers proposed a compelling explanation rooted in the labor dynamics behind AI development. A significant proportion of reinforcement learning and moderation tasks used to train and refine LLMs are outsourced to East African countries, including Kenya and Tanzania. These human annotators provide critical feedback that shapes model behavior, potentially embedding regional linguistic patterns and decision-making frameworks into the models’ outputs. This influence may explain why Tanzanian financial behavior emerged as the closest match to LLM outputs across a diverse international sample.

Are AI financial judgments reliable for real-world use?

The study scrutinized whether LLMs provide economically coherent financial judgments. Two parameters were extracted from the models’ responses: present bias (β), which measures the tendency to prioritize immediate rewards, and impatience (δ), which gauges how sharply future rewards are discounted.

Ideally, these values should fall within a defined logical range. However, several LLMs exhibited δ values greater than one - a mathematical anomaly suggesting they irrationally favored future outcomes more than immediate ones. Similarly, the β value produced by Gemini exceeded unity, indicating a present bias that is rarely observed in actual human behavior and unsupported by empirical evidence.

These irregularities signal limitations in the decision-making logic of current LLMs. While they are often capable of mimicking economic rationality in controlled scenarios, their behavior under more complex, temporally framed choices reveals a superficial form of reasoning. This reinforces recent claims in AI ethics research that LLMs rely on pattern prediction rather than genuine understanding, especially when faced with multifaceted human dilemmas.

The implications are far-reaching for sectors like robo-advisory services, algorithmic trading, and automated financial counseling, where LLMs are increasingly being deployed. If these models exhibit systematic biases or logical incoherence in high-stakes scenarios, users could be misled by seemingly rational advice that lacks behavioral depth or contextual relevance.

The researchers advocate for continued scrutiny of LLMs before they are entrusted with sensitive financial decisions. They recommend future research to explore how prompt engineering, persona framing, and alternative sampling configurations might influence AI responses. They also emphasize the need to examine the training datasets and annotation labor sources to understand the value systems embedded in AI outputs.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback