Generative AI adoption in finance may concentrate risk, not diversify it

A new risk assessment from Miranda McClellan of Schwarzman College, Tsinghua University, argues that large language models (LLMs) used for stock picking could synchronize investor behavior and amplify market shocks, creating vulnerabilities that traditional model-by-model checks miss.
The paper, “AI and Financial Fragility: A Framework for Measuring Systemic Risk in Deployment of Generative AI for Stock Price Predictions,” is published in the Journal of Risk and Financial Management. It proposes a quantitative way to gauge exogenous, market-level risk from LLM-driven trading and outlines technical, cultural, and regulatory levers to contain it.
What new risk does AI introduce into markets?
The study distinguishes the underexplored exogenous risk, system-level instability triggered by external factors and coordinated behavior, from the endogenous risks usually tracked in AI evaluation (accuracy, bias, model failure). The author argues that when many firms deploy similar LLMs, their outputs can converge, producing simultaneous buy or sell signals that pump up bubbles or accelerate crashes across sectors and borders.
To capture this, McClellan develops a covariance-and-correlation metric that measures how closely different LLMs’ stock-price predictions move together. The experiment spans eight general-purpose LLMs, including GPT, Gemini, Claude, DeepSeek, Qwen, Doubao, Cohere, and Mistral, applied to 11 stocks across technology, automobiles, and communications and multiple time horizons. The goal is not to score accuracy but to quantify the relationship between models’ predictions as a proxy for market-level coordination risk.
The finding is stark: all eight models were positively correlated. In other words, the systems tended to point in the same direction, raising the likelihood of synchronized trades that can magnify volatility, strain liquidity, and undermine resilience when conditions change. The paper warns that this kind of homogeneity converts LLM adoption into a potential amplifier of fragility, not just a faster decision engine.
The framework also clarifies why technical fixes alone may disappoint. Classic ensemble tactics, combining models to hedge errors, do not neutralize correlated signals; they can still push markets in lockstep. As models become more capable and easier to deploy, correlation pressures may rise, increasing the need for policy intervention that complements engineering controls.
How does the framework measure systemic risk?
Borrowing from modern portfolio theory, the study applies covariance and correlation coefficients to LLM outputs, treating each model like an asset whose movements can be diversified, if, and only if, their predictions are not all moving in tandem. The pipeline selects models and stocks, constructs prompts using financial indicators, gathers outputs over five timeframes, and then computes pairwise relationships among predictions. The dataset spans U.S., European, and Chinese equities to reflect real-world exposure to geopolitical competition and cross-market contagion.
By focusing on the relationship between models rather than their absolute accuracy, the method surfaces systemic dynamics that typical AI benchmarks ignore. It directly tests whether a “portfolio” of trading LLMs offers diversification benefits or, as the results show, concentrates risk by moving together. That shift from model-centric metrics to market-centric diagnostics is the paper’s central contribution.
The author positions this as a practical tool for firms and supervisors to monitor correlated behavior as GenAI proliferates in trading workflows. The metric, the paper argues, can inform cross-border coordination on AI governance to reduce manipulation risk and maintain stability during stress.
What policy guardrails does the study recommend?
Because correlated outputs make coordinated action more likely, the paper urges multi-level policy that brings together industry bodies, firms, and regulators rather than relying on self-regulation or single-jurisdiction rules. It proposes data-driven governance that requires transparency on model use, mandates routine systemic-risk analysis for high-impact AI trading tools, and empowers local enforcement to audit and sanction misuse.
The analysis includes jurisdiction-specific readings. For example, the paper suggests that China’s centralized financial oversight and existing AI registries could enable stricter pre-deployment controls and even exchange-level exposure caps on LLM-driven strategies, while still supporting innovation. By contrast, the U.S. and EU are encouraged to align transparency, audit capacity, and crisis-coordination playbooks across financial supervisors to avoid regulatory fragmentation.
Most importantly, the study asserts that regulatory attention must shift beyond consumer harm to include market structure risks. Without enforceable standards that map and mitigate correlated AI behavior, the sector could face algorithmic bandwagon effects - rapid, correlated trades that move prices irrespective of fundamentals.
Limitations and next steps
The author notes two boundaries to the evidence. First, proprietary hedge-fund models, often trained on richer data, were not included; the experiment used widely available general-purpose LLMs to reflect what many firms can access today. Second, the framework examines price-prediction use cases; other LLM deployments in finance, such as execution or risk operations, warrant separate testing. Even so, the results point to a material systemic-risk signal that should be measured and governed as adoption scales.
To support replication, the paper provides a public repository containing prompts, indicators, model responses, and the correlation calculations. The author calls for extending the analysis to additional sectors, such as healthcare or shipping, to map where correlated AI behavior might pose the greatest macro-financial risk.
- FIRST PUBLISHED IN:
- Devdiscourse