How LLMs can add second lens to qualitative health research


COE-EDP, VisionRICOE-EDP, VisionRI | Updated: 30-05-2026 22:12 IST | Created: 30-05-2026 22:12 IST
How LLMs can add second lens to qualitative health research
Representative image. Credit: ChatGPT

Large language models (LLMs) could help health researchers analyse large qualitative datasets more efficiently, but they should be treated as a secondary analytic lens rather than a substitute for human interpretation, according to a new study by Callum Hill, Jacob Keast, Arun Dahil and Hajira Dambha-Miller of the University of Southampton.

Their study, titled Multimorbidity and AI-enabled health and social care: A methodological illustration of integrating large language models into qualitative analytic workflows, was published in the Journal of Multimorbidity and Comorbidity. It used interviews with people living with multimorbidity, carers and health and social care professionals to examine both perceptions of AI-supported social care and the role of large language models in qualitative research.

The study addresses two fast-moving issues in health research:

  1. The first is the growing challenge of multimorbidity, commonly defined as the co-occurrence of two or more chronic conditions. People living with multimorbidity often face unmet social care needs related to housing, food, mobility, everyday functioning and support navigation. These gaps can reduce wellbeing, increase hospital use and add pressure to already strained health and social care systems.
  2. The second issue is the rise of AI in health research workflows. Large language models such as Claude can summarise text, identify possible themes and cluster concepts across large amounts of interview data. However, their use in qualitative research remains contested because qualitative interpretation depends on context, emotional nuance, power dynamics, participant meaning and researcher reflexivity.

The study thus set out not to replace human analysis, but to show how LLM-assisted outputs can be integrated into a structured, transparent and human-reviewed analytic process.

The researchers conducted a secondary thematic analysis of 75 interview transcripts. The dataset included 40 people living with multimorbidity and 35 informal carers or health and social care professionals, including general practitioners, social prescribers, community support workers and wellbeing coaches. Participants had previously been interviewed about daily challenges, social care needs and reactions to a hypothetical AI-supported tool for care planning.

Patients saw promise in AI but feared impersonal care

The substantive findings showed that participants recognised possible benefits from AI-enabled tools in social care, particularly if those tools could support more personalised, coordinated and proactive help. People living with multimorbidity often described care as fragmented and exhausting. Many felt forced to manage appointments, medications, forms, referrals and communication across disconnected services by themselves.

Participants described the work of navigating health and social care as a form of constant coordination. The study found that patients and carers often have to link up different services, repeat their histories and chase support across systems that do not communicate well. The researchers describe this as part of the wider challenge of multimorbidity care, where people with complex needs are frequently left to manage complexity that should be shared by institutions.

Participants also raised concerns about trust, privacy and transparency. Some were uncertain about what AI tools might do with their information, whether data would be used to support them or profile them, and whether automated systems would be reliable in real care settings. These concerns were especially important because people with multimorbidity may already feel exposed, dependent on services and vulnerable to being misunderstood.

Digital access emerged as another barrier, with some participants lacking reliable devices, confidence or capacity to use online systems. Digital exclusion was not only about internet access. It also involved cognitive load, affordability, confidence and the practical difficulty of managing digital tasks while dealing with multiple long-term conditions.

The study found that participants wanted technology to be personal, not generic. Many were interested in tools that could understand individual circumstances, support long-term care planning and help connect health and social care options. But that interest was conditional. Participants were more willing to consider AI if it actually reduced burden, improved coordination and respected their needs. Prior negative experiences with services made some cautious about new promises.

Across the interviews, participants repeatedly stressed the importance of empathy and human connection. Participants did not only want information or administrative outputs. They wanted to feel heard, understood and recognised as whole people. Some described existing care systems as transactional, rushed or dismissive. That finding is central to the study’s message: AI may support coordination, but it cannot solve the relational gaps in care if systems continue to neglect human connection.

Health and social care professionals were generally more optimistic than patients about AI’s potential to support coordination. Professionals often focused on structural barriers and the possibility that AI could help make services more joined up. Patients were more likely to describe emotional burden, past disappointment and concerns about whether AI tools would worsen impersonal care. This difference does not mean patients rejected AI. It shows that acceptability depends on whether tools are built around lived experience rather than administrative convenience alone.

Claude surfaced useful patterns but also overstated meaning

The researchers compared human reflexive thematic analysis with LLM-assisted qualitative analysis using Claude Sonnet 4. They processed anonymised transcripts through staged prompts designed to move through three layers of meaning: exploratory, interpretive and integrative.

The exploratory layer focused on broad themes and recurring concerns. The interpretive layer examined emotional tone, implicit values and latent meaning. The integrative layer compared cross-cutting patterns across stakeholder groups, including patients, carers and professionals. Human analysis and LLM-assisted analysis were then compared through convergence-divergence mapping, a method used to assess where the outputs aligned, where they differed and where model interpretations required correction or rejection.

The researchers found substantial convergence between human and LLM-assisted analysis. Both approaches identified care fragmentation, perceived lack of empathy, digital access barriers, uncertainty about AI, desire for personalised support, and the emotional and administrative burden of managing multimorbidity. Both also found that participants wanted joined-up care anchored in ongoing human relationships.

The LLM-assisted analysis also introduced useful alternative framings. One example was the concept of cognitive overload. Manual coding had identified repetition, fragmentation and administrative burden as separate concerns. Claude helped frame these experiences as a broader pattern of cognitive overload, capturing the mental strain of repeated appointments, forms, online systems and care coordination.

Another useful model-generated framing was invisible labour. Manual analysis had identified the tasks patients and carers performed, such as booking appointments, following up referrals and translating between services. The LLM reframed these activities as essential but unrecognised work carried out by patients and carers inside fragmented systems. The researchers found this framing useful when verified against transcript evidence.

Claude also highlighted power dynamics in patient-professional relationships. It interpreted descriptions of scripted conversations, being passed around or going in circles as signs of asymmetry between patients and health professionals. This helped surface issues of agency, status and recognition that were present in the data but less emphasised in the manual analysis.

The model also identified figurative language as analytically important. It treated metaphors such as black box, maze and conveyor belt as markers of alienation, opacity and mechanisation in care systems. This allowed the analysis to examine how participants used language to express distrust or exhaustion, rather than focusing only on explicit statements.

However, the study also found clear risks. Claude sometimes overstated emotional tone, misattributed meaning or simplified participant accounts. In one example, the model described a participant’s statement as quiet resignation, but human review found that this interpretation went beyond the wider transcript context. In another case, the model reduced a complex account about post-hospital medication confusion to a simpler summary about patients being confused after discharge. The researchers rejected these interpretations because they weakened or distorted the meaning of the original account.

These examples support the study’s key methodological warning. LLM outputs can be useful, but they are provisional. They must be checked against transcripts, researcher notes and human interpretation. In qualitative research, the question is not only whether a model can identify patterns - but whether those patterns are grounded in the data and interpreted with enough contextual care.

Transparent LLM workflows in health research

The researchers argue that LLM-assisted analysis may be most useful when applied to large qualitative datasets where scale creates practical barriers. In this study, Claude helped rapidly process and synthesise 75 transcripts, propose candidate themes, surface alternative framings and identify cross-cutting patterns. That could help researchers work more efficiently without abandoning reflexive standards.

The study recommends treating LLM outputs as candidate interpretations rather than definitive findings. Model-generated summaries, themes and labels can support researcher reflection, but final conclusions must come from human judgement. This is especially important in health and social care research, where participants’ accounts often involve vulnerability, trauma, institutional mistrust and unequal power relationships.

The researchers also recommend using LLMs to surface alternative integrative findings. The model was useful in identifying broader relationships across interviews, including links between care fragmentation, emotional burden, loss of trust and conditional attitudes toward AI. These connections can help researchers think across large datasets, but they still require verification.

Human oversight, as the study points out, remains the most important safeguard. The study stresses that qualitative analysis is not a mechanical process of extracting topics. It involves interpretation, reflexivity and attention to context. LLMs can assist with organisation and pattern recognition, but they do not replace the responsibility of researchers to assess meaning, evidence and ethical implications.

The study also points to reproducibility challenges. LLMs are probabilistic systems, meaning they may generate different outputs across runs. Claude is a general-purpose model, not a system trained specifically for qualitative health research. Its outputs may reflect biases in training data, and its contextual sensitivity may be limited. The researchers therefore documented prompts, model outputs and coding decisions to strengthen transparency.

The study was conducted within the English health system, which may limit transferability to other countries. The dataset was large for qualitative research, but recruitment relied on voluntary and network-based methods, which may introduce participant selection bias. The study also did not use quantitative agreement measures to compare human and model performance, something future research could add.

Implications for health systems, researchers and patients

For health and social care, the findings assert that AI-enabled tools may help people with multimorbidity by improving coordination, personalisation and proactive support, but they must be designed with attention to trust, digital access, emotional support and the risk of impersonal care. Patients and carers want systems that reduce burden, not tools that add another layer of complexity.

For researchers, LLMs can speed up parts of qualitative analysis and help reveal overlooked patterns, but they cannot replace critical, reflexive judgement. Their best role is as a structured assistant that expands the analytic field, not as an authority that decides what participants mean.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback