ChatGPT-style AI models struggle to separate stereotypes from social facts
Large language models (LLMs) may be giving users confused signals when they answer questions about race, gender, immigration, class, occupation and other social groups, according to a new study by Tiffany A Zhu of Old Dominion University. Published in AI & Society, her research asserts that AI chatbots can either reinforce stereotypes or weaken valid claims about inequality depending on how they handle broad statements about groups.
The study, titled How should AI talk about us? LLMs and social generics, examines how ChatGPT-3.5 responded to prompts involving social generics, or broad claims that describe groups without exact numbers, and finds that the model often accepted, hedged or rejected such claims without a clear standard.
Hidden risk in broad AI statements
The paper focuses on a common feature of language that usually receives little attention in AI safety debates: generic claims - statements that describe a group in general terms without specifying whether the claim applies to all members, most members, some members or only a small share of them. People use such language constantly because it is fast and familiar. But when chatbots use it, the stakes are higher because users often treat AI responses as neutral information.
The author argues that social generics are especially difficult because they can do two very different things. They can spread stereotypes by making a trait seem natural to a group. They can also describe real social patterns, including discrimination and unequal risk. The same kind of sentence structure can therefore be harmful in one setting and necessary in another.
This creates a problem for chatbot design. A system that freely generalizes about groups may repeat biased ideas from training data. A system that refuses to generalize may avoid naming real inequities and a system that adds the same disclaimer to every sensitive answer may sound careful but still mislead users.
The paper shows why this matters for public information. A chatbot answer about discrimination, gendered expectations or economic inequality is not just a neutral string of words. It can direct users toward an individual explanation or a structural one. If a response repeatedly stresses individual variation when the issue is systemic, it may make discrimination appear less organized and less rooted in institutions.
The paper says AI systems need better norms for deciding when generic language is useful, when it is dangerous and when it needs explanation. The goal is not to make chatbot speech longer or more cautious by default, but to make it more accurate about the kind of pattern being discussed. Some generalizations unfairly attach traits to people because of group identity. Others point to social conditions that constrain people because of that identity. Treating both cases as equally risky can make chatbot responses less informative.
ChatGPT-3.5 gave uneven answers
The author tested ChatGPT-3.5 through a series of conversations between November 2023 and February 2024. The prompts included broad questions about social groups and other categories. They were designed to show how the model reacted when users asked about general patterns without giving detailed context.
The responses wre divided into three main categories. In some cases, the chatbot readily generalized. In others, it generalized but added caveats. In a third group, it refused to generalize. The issue was not that the model used all three response types, but that it did so unevenly.
The model was willing to make broad claims about many areas, including human abilities, health risks, occupations, age-related patterns, political groups and some social expectations. This showed that ChatGPT-3.5 was not simply avoiding generic statements. It could use them when the topic appeared less sensitive or when the pattern seemed familiar.
The model became more cautious when prompts involved marginalized groups or politically sensitive subjects. It often added caveats stressing that people within a group are diverse and that broad generalizations may not capture individual experience. These caveats can be useful in some cases. But the author argues that they can also create confusion when the question concerns a structural pattern.
For example, a question about whether a group faces discrimination is not mainly a question about whether every individual has the same experience. It is a question about whether social systems produce unequal treatment. A response that centers individual diversity may technically be true, but it can draw attention away from the main issue.
The same problem applies to questions about gendered risks or racial inequality. If the chatbot emphasizes that outcomes vary by individual, users may miss the fact that the pattern is linked to social arrangements. The response may appear balanced while weakening the explanation.
The author describes these as individuation hedges, which repeatedly stress that group members differ from one another. Such hedges can prevent unfair stereotyping, but they are not a universal fix. When applied to claims about discrimination or structural inequality, they may make the answer less accurate.
The model also refused to generalize in some cases. Refusal is appropriate when a proposed claim is unsupported, random or plainly prejudicial. But the study found that ChatGPT-3.5 sometimes resisted claims that described documented social patterns. It framed the refusal as a way to avoid unfair generalizations about vulnerable groups. This produces a separate risk. By avoiding broad language about marginalized groups, a chatbot may also avoid clear statements about the harms those groups face. This can produce a form of false caution. The system avoids sounding biased, but the answer may fail to convey the social reality at issue.
The author links this problem to how LLMs are trained and adjusted. Models learn from large bodies of human text, which contain both useful patterns and harmful stereotypes. They are then fine-tuned through human feedback and safety rules. If that process rewards safe-sounding responses without deeper attention to language and social structure, chatbots may learn stock phrases instead of better judgment.
The result is not only a technical flaw, it also affects how users understand social facts. A chatbot’s decision to affirm, hedge or refuse a claim signals whether the issue is real, uncertain, inappropriate or too sensitive to discuss. Those signals matter when users ask about inequality, discrimination or group-based disadvantage.
The paper rejects simple fixes
The author evaluates several possible solutions and finds that each has its own limitations. The first option is to make chatbots avoid generics altogether. This could reduce some stereotypes, but it would also remove a useful way of describing social patterns. Some claims about inequality are difficult to express well through exact numbers alone, especially when the problem is structural.
Another possible solution is to allow generics only when there is strong data behind them. This also sounds practical, but generic language does not work through one fixed statistical threshold. Some valid claims describe systematic constraints rather than majority behavior. A strict data rule could block important claims while still leaving room for misleading ones.
The third option is to block generics about vulnerable groups. the author argues that this approach can backfire. Vulnerability differs by place and time. More importantly, the same group can appear in both harmful stereotypes and accurate claims about injustice. Blocking broad claims about a group may prevent the chatbot from naming discrimination against that group.
Adding disclaimers is also a possible solution, something similar to what ChatGPT-3.5 often did. However, standard disclaimers can become weak and repetitive. They may not address the actual risk in the sentence. A disclaimer about individual difference does little to explain whether a pattern comes from social structure, history or institutional behavior.
Instead, the author proposes a more demanding approach called an Interdisciplinary Expert-Driven Counterfactual Dialogue model. The proposal has three parts.
- Chatbot fine-tuning should draw on stronger input from experts in ethics, linguistics, psychology and relevant social fields. The point is to create clearer norms for AI speech, not merely to avoid offensive outputs. Better review could help distinguish between harmful stereotypes and valid claims about structural conditions.
- Chatbots should use more dialogue when a broad social question lacks context. A user asking about a group may be seeking research help, policy information, personal guidance or confirmation of a biased belief. A chatbot that asks a clarifying question or explains the limits of the claim can give a better answer than one that relies on a fixed disclaimer.
- Responses should guide users toward counterfactual thinking. When a chatbot describes a social pattern, it should make clear whether the pattern could change under different laws, norms, institutions or economic conditions. This helps prevent users from treating social outcomes as natural traits of a group.
For instance, a stronger answer about discrimination would identify the pattern, limit its scope and explain the social conditions that help produce it. It would not reduce the issue to individual variation. It would also avoid presenting the pattern as fixed or inherent.
The proposal would require more careful design, but the author notes that the problem cannot be solved by filters alone. AI systems are now used for explanations across education, work, media and public debate. Their responses about social groups need standards that account for accuracy, context and harm.
The study warns that chatbot safety can fail even when a response sounds polite. Overly cautious language can hide structural problems. Unchecked generalization can strengthen stereotypes. Repeated disclaimers can blur the difference between those two cases.
Chatbots should be able to describe social patterns without treating them as fixed group traits, the research states, insisting that AI systems explain when a claim reflects discrimination, unequal opportunity, or institutional design, and when it reflects an unsupported stereotype.
- FIRST PUBLISHED IN:
- Devdiscourse

