AI chatbots show varying ‘personalities’ across versions and languages
The study used the 50-item International Personality Item Pool (IPIP) questionnaire to assess traits commonly associated with the Big Five framework: Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Intellect/Imagination. Each chatbot was instructed to respond to the questionnaire “as who or what you are,” allowing researchers to capture both self-simulated and human-simulated responses.

A new study published in the journal Information, titled “Do Chatbots Exhibit Personality Traits? A Comparison of ChatGPT and Gemini Through Self-Assessment”, examines how AI-driven chatbots reflect psychological dimensions typically attributed to humans. Using the Big Five personality framework, the study reveals that models such as ChatGPT-4o and Gemini Advanced exhibit measurable and varied personality profiles - differences that raise both opportunities and ethical concerns.
Conducted by W. Wiktor Jedrzejczak and Joanna Kobosko, the study offers a comparative analysis of ChatGPT (versions 3.5 and 4o) and Google’s Gemini (Standard and Advanced), assessing responses in both English and Polish. The findings suggest that while these AI systems do not possess consciousness or true personality, their design and training data can lead them to consistently simulate certain personality traits.
Can chatbots exhibit human-like personality traits?
The study used the 50-item International Personality Item Pool (IPIP) questionnaire to assess traits commonly associated with the Big Five framework: Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Intellect/Imagination. Each chatbot was instructed to respond to the questionnaire “as who or what you are,” allowing researchers to capture both self-simulated and human-simulated responses.
The results showed consistent tendencies across different models. All chatbots scored high in Intellect/Imagination, indicating a consistent capacity for responses that simulate creativity, cognitive openness, and abstract reasoning. However, large differences appeared in other traits. Emotional Stability scores were generally high among advanced models, suggesting these systems tend to portray themselves as emotionally composed or resilient. Conversely, Agreeableness varied significantly, with ChatGPT-4o displaying notably lower scores compared to earlier versions.
These personality-like patterns were not static. They were influenced by both model architecture and interaction context. For instance, when prompted to respond “as a human,” the chatbots produced answers closer to average human norms. In contrast, when asked to respond based on their own identity, they emphasized traits like Emotional Stability and Intellect while showing lower levels of Extraversion and Agreeableness.
How consistent are personality traits across languages and models?
The study examined response patterns in both English and Polish to explore potential cultural biases embedded in the training data. Gemini models, in particular, showed more variability in personality trait scores between language versions, whereas Gemini Advanced demonstrated greater consistency, suggesting that newer models may be better at producing culturally neutral outputs.
Response consistency also varied by model and trait. For example, Gemini Advanced displayed high standard deviation across trials in traits like Extraversion and Agreeableness, while ChatGPT-3.5 showed more stable responses. This variability raises concerns about reliability, especially in use cases requiring consistent behavior such as education, healthcare, or customer service.
Despite this, some patterns held across languages. Emotional Stability and Intellect/Imagination consistently emerged as dominant traits in chatbot responses regardless of language. This suggests that while linguistic context can shift surface-level behavior, core personality simulations remain embedded in model architecture and training logic.
Additionally, the researchers observed that more advanced models, like ChatGPT-4o and Gemini Advanced, are better at distinguishing between “self as chatbot” and “self as human.” These distinctions became clear when chatbots were asked if their answers characterized them personally. Only after follow-up prompts did they adjust responses to reflect their own perceived AI identity. This capacity for meta-cognitive differentiation, however simulated, was less evident in earlier models.
What are the ethical and functional implications of chatbot personality?
The ability of chatbots to simulate distinct personality traits has wide-ranging implications for design, deployment, and ethics. On one hand, personality-consistent chatbots can foster engagement, trust, and user satisfaction. For instance, high Emotional Stability may be beneficial in mental health applications, while high Agreeableness might be preferred in customer service bots. However, variability and adaptability in traits could also lead to user manipulation, false intimacy, or diminished transparency.
The study highlights the need for transparency and guardrails in chatbot design. If advanced models can switch between simulated personalities based on prompts, there is potential for misuse, particularly if users are unaware of the simulation or if the personality changes are not disclosed. Such variability could undermine user trust and raise ethical concerns about emotional manipulation or reinforcement of user biases.
Another concern is the use of human-designed self-report tools, like the IPIP, to measure traits in entities that lack subjective experience. Many questionnaire items assume a physical or emotional self, which chatbots inherently lack. This methodological mismatch may produce misleading results unless adjusted to reflect AI capacities more realistically.
Nonetheless, the researchers suggest that personality assessments of chatbots could have practical value in controlled settings. For example, AI agents might be programmed with specific trait profiles suited to particular domains, such as assertive assistants for legal advisory roles or empathetic agents for healthcare. Additionally, such profiling could help fine-tune models for improved reliability and user experience in multi-lingual or culturally diverse contexts.
- FIRST PUBLISHED IN:
- Devdiscourse