Adaptive Cross-Attention Fusion of Spatial–Frequency Features with Hierarchical Transformers for Cervical Cancer Classification
1
Research Scholar, Department of Computer and Information Science, Annamalai University, Annamalainagar, Tamilnadu, India.
2
Assistant Professor/Programmer, Department of Computer and Information Science, Annamalai University, Annamalainagar, Tamil Nadu, India.
Received: 2025-09-18
Revised: 2025-10-08
Accepted: 2025-10-15
Published: 2025-10-30
Cervical cancer is a global health issue that requires precise and timely detection techniques to enable efficient clinical decision-making. In this paper, we introduce a new deep learning paradigm for classification of cervical cancer image into six classes based on spatial and frequency-domain features. The input images are first improved using preprocessing operations like CLAHE, normalization, resizing, augmentation, and balancing. In addition, Discrete Wavelet Transform (DWT) is used to obtain frequency sub-bands that supplement the information in the spatial domain. Next, a dual-stream convolutional neural network (CNN) is used for extracting structural and textural features that are combined through an adaptive cross-attention block with learnable fusion weights. This allows for dynamic representation of spatial-frequency dependencies. The combined representation is then subjected to further refinement via a hierarchical transformer encoder, which is intended to learn local cellular patterns and global tissue-level dependencies. Lastly, a dense classification head with dropout predicts the stage of cervical cancer. The model is trained with an 80:20 train-test split, optimized with AdamW, and a composite loss function consisting of cross-entropy, focal loss, and attention consistency loss. Experimental outcomes prove that the proposed approach performs better accuracy, precision, recall, F1-score, and AUC than traditional baselines. Also, interpretability is guaranteed through Grad-CAM visualization for CNN streams, providing improved clinical explainability. This framework offers a robust and interpretable method for automatic cervical cancer diagnosis.
Cervical Cancer Classification, Dual-Stream CNN, Discrete Wavelet Transform (DWT), Adaptive Cross-Attention Fusion, Hierarchical Transformer Encoder, Explainable Deep Learning, Attention Consistency Loss, Spatial-Frequency Feature Fusion.