Advanced Hybrid Deep Learning Model Enhances Reliability of Automobile Fraud Detection Systems
Researchers from Universiti Tun Hussein Onn Malaysia and Chettinad Institute of Technology developed a Golden Eagle-Assisted Hybrid BERT-LSTM model that detects automobile insurance fraud with over 99% accuracy. The study marks a major breakthrough in applying bio-inspired optimization and deep learning to enhance financial fraud detection efficiency and reliability.

Researchers from the Universiti Tun Hussein Onn Malaysia (UTHM) and the Chettinad Institute of Technology in India have unveiled a groundbreaking artificial intelligence model that promises to transform automobile insurance fraud detection. Their study, published in the Decision Analytics Journal (2025), introduces a Golden Eagle-Assisted Optimization (GEAO) system combined with a hybrid deep learning model (BERT-LSTM) to detect fraudulent claims with exceptional accuracy. This international collaboration brings together mathematical innovation and computational intelligence to address one of the insurance industry’s most persistent challenges, false claims that cost billions annually. The authors highlight that in 2017 alone, fraudulent insurance activity accounted for losses of USD 221 million in Brazil and nearly USD 80 billion in the United States, underscoring the urgency for smarter, automated fraud detection systems that can adapt to evolving digital deception.
Why Traditional Systems Fall Short
Insurance fraud has evolved beyond human intuition and manual verification. Traditional fraud detection systems rely heavily on expert knowledge and statistical models that struggle with large, complex, and imbalanced datasets. The study points out that fraudulent claims make up only a small fraction of total insurance data, creating a class imbalance that causes conventional models to misclassify or overlook deceptive cases. Moreover, the abundance of irrelevant features in raw claim data can lead to overfitting, reducing model efficiency and interpretability. Previous research using machine learning techniques such as Random Forest, Naïve Bayes, and Support Vector Machines achieved moderate success but required extensive manual feature engineering. Even deep learning models like CNN-LSTM or autoencoders, though powerful, failed to incorporate advanced feature optimization, leaving room for improvement. This gap inspired the researchers to design a hybrid system that could automatically select the most relevant features and learn from complex data relationships.
The Golden Eagle Approach: Nature-Inspired Intelligence
The heart of this research lies in the innovative Golden Eagle-Assisted Optimization (GEAO) algorithm, a metaheuristic model inspired by the hunting behavior of golden eagles. Just as an eagle alternates between scanning vast landscapes and diving precisely at prey, the algorithm shifts between exploration (searching for diverse feature combinations) and exploitation (focusing on the most promising ones). Each eagle represents a candidate solution, evaluating and refining its position based on the accuracy of feature subsets. This biologically inspired process allows the algorithm to avoid local minima and find the global optimum feature set efficiently. By integrating GEAO into the preprocessing phase, the researchers dramatically reduced computational complexity while improving feature relevance, ensuring that only the most informative data points were passed to the deep learning classifier. The carclaims.txt dataset, containing 15,420 insurance claim records, with 923 fraudulent and 14,497 legitimate, served as the testing ground. The researchers used the Synthetic Minority Oversampling Technique (SMOTE) to balance the data, ensuring that fraudulent claims comprised around 30–40 percent of the dataset.
The Power of Hybrid Deep Learning: BERT Meets LSTM
Once the optimized features were selected, they were processed through a hybrid BERT-LSTM deep learning framework. The BERT (Bidirectional Encoder Representations from Transformers) component captures contextual and semantic relationships between features, mimicking how humans understand patterns in text and numerical data. The LSTM (Long Short-Term Memory) component, on the other hand, learns sequential dependencies and long-term relationships, identifying hidden temporal connections in the dataset. The synergy of these two networks allows the model to recognize intricate fraud patterns that often mimic genuine claims. The structure, illustrated in the paper’s Figure 5, combines transformer-based contextual encoding with memory-driven classification, producing a system capable of interpreting both textual and numerical features. Batch normalization and dropout layers were employed to improve generalization, while a feed-forward layer classified claims as either fraudulent or non-fraudulent.
When tested across three cases with varying dataset sizes (10,000, 12,000, and 14,000 claims), the hybrid model achieved stunning results: accuracy of 99.02 percent, recall of 99.1 percent, and F-score near 98.5 percent. It consistently outperformed other models, including RNN, standalone BERT, and Bi-LSTM, by margins of up to six percent. The Receiver Operating Characteristic (ROC) curve yielded an AUC of 0.99, signaling near-perfect classification. In the confusion matrix, only six legitimate claims were misclassified, an astonishingly low error rate for such a complex dataset.
Implications, Comparisons, and the Road Ahead
The study compared its results with leading existing approaches such as CNN-LSTM, Fuzzy C-Means Genetic Algorithm (FCM-GA), Support Vector Machines (SVM), Random Forest, and XGBoost. While models like XGBoost achieved high accuracy (around 96 percent), none matched the precision and robustness of the hybrid BERT-LSTM system. The authors attribute this success to three crucial components: balanced data using SMOTE, intelligent feature selection through GEAO, and contextual learning via BERT-LSTM. This tri-layered approach not only boosts accuracy but also enhances interpretability and scalability, making it suitable for real-world insurance applications where large, noisy datasets are the norm.
However, the researchers also acknowledge limitations. The model was tested only on one dataset, which, while comprehensive, does not capture the full range of fraud scenarios seen in diverse markets like health or property insurance. Real-world data are often incomplete, inconsistent, and unstructured, factors that may challenge the model’s current performance. To address these gaps, future research will extend testing across multiple datasets, assess cross-domain applicability, and strengthen the model’s resistance to noisy and missing data.
Despite these challenges, the collaboration between UTHM and Chettinad Institute has delivered a pioneering framework that fuses nature-inspired optimization with deep contextual intelligence. By emulating the precision of a golden eagle’s hunt, the model achieves what few before it have: identifying fraudulent automobile claims with almost total accuracy. The study stands as a testament to how bio-inspired algorithms and advanced neural architectures can converge to redefine fraud analytics, offering a new benchmark for transparency and efficiency in the global insurance industry.
- FIRST PUBLISHED IN:
- Devdiscourse