AI health apps face transparency and usability challenges despite growing popularity

Results show that ADA outperformed its competitors, achieving the highest average System Usability Scale (SUS) score of 81.3. Mediktor followed with a score of 76.5, while WebMD lagged behind at 70.6. Despite ADA’s relatively strong performance, the study notes that all three apps exhibited persistent flaws, particularly in error handling, transparency, and communication of AI-driven decisions.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 28-07-2025 08:41 IST | Created: 28-07-2025 08:41 IST
AI health apps face transparency and usability challenges despite growing popularity
Representative Image. Credit: ChatGPT

In a groundbreaking evaluation of artificial intelligence-powered mobile health applications, researchers have found significant usability and transparency gaps in leading mHealth tools, highlighting the limitations of current AI-driven healthcare apps and call for urgent design improvements to enhance trust and patient safety.

The study, titled "A Comprehensive Comparison and Evaluation of AI-Powered Healthcare Mobile Applications’ Usability" and published in the peer-reviewed journal Healthcare, offers a comprehensive assessment of three widely used AI-enabled apps, ADA, Mediktor, and WebMD, revealing that even the most popular tools fall short in providing explainable and user-friendly experiences.

How usable are AI-driven healthcare apps?

AI-enabled mobile health applications are rapidly becoming central to patient care, diagnostics, and self-assessment. These apps promise improved decision support, personalized recommendations, and efficient navigation of healthcare systems. However, usability remains a critical determinant of whether patients embrace such tools in their daily lives. The research team adopted a triangulated evaluation approach that combined expert heuristic analysis, user testing with thirty participants, and automated technical assessments to scrutinize the apps’ performance.

The usability evaluation focused on three key metrics: effectiveness, efficiency, and user satisfaction. Experts applied a 13-item AI-specific heuristic checklist, while participants aged 18 to 65 performed five health-related tasks using each app. Data collected on task success rates, completion times, errors, and satisfaction scores were analyzed using advanced statistical methods to confirm significant differences between the applications.

Results show that ADA outperformed its competitors, achieving the highest average System Usability Scale (SUS) score of 81.3. Mediktor followed with a score of 76.5, while WebMD lagged behind at 70.6. Despite ADA’s relatively strong performance, the study notes that all three apps exhibited persistent flaws, particularly in error handling, transparency, and communication of AI-driven decisions.

Why do users struggle with trusting these apps?

One of the most striking revelations of the research is the pervasive lack of explainability across all evaluated applications. None of the systems provided confidence levels, rationale for diagnoses, or links to clinical guidelines that could help users understand AI-generated recommendations. Users frequently hesitated when interpreting outputs and expressed concerns over the absence of clear explanations. These transparency gaps were also flagged by expert evaluators, who identified failures in explainability heuristics as high-severity usability issues.

Trust in AI is especially crucial in healthcare contexts, where ambiguous recommendations can have serious consequences. The absence of confidence indicators, traceable decision logic, and contextual cues undermines user confidence and can discourage long-term engagement with these tools. The authors stress that explainable AI (XAI) features should not be optional but integral to the design of mHealth applications to ensure reliability and user safety.

The study also highlights problems with navigation and input validation. Users encountered vague error messages, inconsistent designs, and limited guidance when they made mistakes. Experts noted that the apps often lacked undo options, clear feedback mechanisms, and personalization features to accommodate users with varying levels of digital literacy. For WebMD, cluttered interface design and weak interactive feedback further contributed to low usability scores. Mediktor, although better than WebMD, struggled with branching question logic that confused many participants.

What can developers learn from this evaluation?

While ADA demonstrated superior usability compared to its peers, the study makes clear that even the best-performing app failed to meet critical expectations for transparency and user guidance. The authors argue that integrating user-centered design principles alongside explainable AI elements is essential to improve adoption and trust.

The study recommends several design enhancements. Developers should implement confidence indicators that show how certain the AI is about its recommendations. They should also provide optional explanations that clarify how user inputs influence the system’s outputs. Progressive disclosure, where users can toggle between summary and detailed views, is proposed as a solution to balance simplicity with transparency. Additionally, improvements in error prevention, personalization, and multilingual support are emphasized to broaden accessibility.

The research also underscores the value of combining expert and user evaluations in assessing usability. While heuristic reviews identified many interface-level problems, real-user testing revealed deeper challenges tied to user comprehension and decision-making. The authors advocate for ongoing evaluations that capture the dynamic interactions between users and AI systems, especially as digital health tools continue to evolve.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback