How safe is AI? New research reveals why current metrics may not be enough
The study breaks down AI trustworthiness into a constellation of interdependent principles: fairness, transparency, privacy, accountability, security, robustness, and explainability. Each of these attributes poses unique challenges and often conflicts with others. For instance, enhancing transparency might compromise privacy, while achieving fairness may reduce model efficiency.

From healthcare and finance to government and public safety, artificial intelligence is proliferating into all key sectors. However, a new study reveals that the trustworthiness of AI systems remains deeply complex and inconsistently measured, highlighting the urgent need for standardized, transparent, and accountable evaluation frameworks.
The study, titled “Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries” and published in the journal Electronics, assesses major AI governance models and measurement tools while identifying persistent trade-offs, regulatory shortcomings, and practical challenges in implementing trustworthy AI.
What does it mean to trust an AI system?
The study breaks down AI trustworthiness into a constellation of interdependent principles: fairness, transparency, privacy, accountability, security, robustness, and explainability. Each of these attributes poses unique challenges and often conflicts with others. For instance, enhancing transparency might compromise privacy, while achieving fairness may reduce model efficiency.
The paper stresses that ensuring AI systems are trustworthy cannot rely on performance metrics like accuracy alone. Instead, a more holistic approach is needed, incorporating both technical safeguards and ethical design considerations. The researchers analyzed widely adopted frameworks such as the NIST AI Risk Management Framework, the AI Trust Framework and Maturity Model (AI-TMM), and the ISO/IEC standards. These were compared for their ability to assess risks across the full AI lifecycle - from design and development to deployment and post-deployment monitoring.
The team found that although these frameworks provide valuable structure, they are often applied inconsistently across industries and geographies. Many organizations still lack internal capacity or standardized benchmarks to measure attributes such as explainability or data fairness. The absence of universal evaluation tools hinders accountability and increases risks of unintended harm or bias.
How can trust be measured and what tools exist?
To evaluate trustworthiness in AI systems, the authors examined a range of quantitative and qualitative metrics. Tools like Shapley values and LIME (Local Interpretable Model-Agnostic Explanations) help explain model predictions and attribute decision outcomes to specific input features, increasing interpretability and stakeholder confidence. Other techniques such as federated learning, differential privacy, and robustness testing using adversarial examples are highlighted as critical for improving data protection, model integrity, and system reliability.
The study identifies robustness as a particularly important yet under-assessed quality. It emphasizes metrics that evaluate out-of-distribution detection, noise tolerance, and anomaly handling, especially in dynamic or adversarial environments. These techniques are vital for systems operating in high-stakes domains like autonomous vehicles and emergency medical services.
Despite the sophistication of these tools, their adoption is still largely siloed. Many organizations deploy explainability techniques in isolation without integrating them into a broader governance framework. Furthermore, technical explanations often lack clarity for non-expert stakeholders, limiting their impact on public trust and regulatory compliance.
The research calls for standardized benchmarks and cross-industry harmonization of trust metrics to enable better comparisons and more effective regulation. It also stresses the need for continuous updates to these frameworks as technologies evolve, particularly in light of new AI paradigms such as large language models (LLMs) and generative AI, which introduce new layers of complexity and ethical risk.
Where are the gaps in practice, and what can be done?
The study goes beyond theory, illustrating real-world applications and pitfalls across sectors including healthcare, finance, and public administration. In Copenhagen, for instance, an AI system designed to detect cardiac arrest during emergency calls showed promise but faced limitations due to language bias and lack of transparency. In Sweden, an employment support system powered by neural networks raised concerns over fairness and explainability, particularly for marginalized subgroups like young jobseekers and those with disabilities.
In the financial sector, case studies showed that metrics like SAFE (Sustainability, Accuracy, Fairness, and Explainability) are being used to evaluate AI-based credit scoring models, offering insights into their alignment with ethical and regulatory standards. However, the implementation of these tools is not yet widespread, and questions remain about their adaptability across diverse data environments.
Regulatory frameworks are catching up, but progress is uneven. The EU AI Act, New York City’s Int. 1894, California’s AB 13, and the U.S. Algorithmic Accountability Act of 2022 all attempt to impose impact assessments, transparency mandates, and bias audits. Yet, enforcement mechanisms remain fragmented, and many AI systems still operate in opaque “black-box” conditions, leaving users and regulators with little recourse in cases of harm or malfunction.
The researchers argue that interdisciplinary collaboration among data scientists, ethicists, domain experts, and legal authorities is essential to bridge the gap between abstract principles and practical safeguards. They advocate for a proactive governance approach that includes stakeholder engagement, transparent documentation, and continuous auditing of AI systems.
- FIRST PUBLISHED IN:
- Devdiscourse