QLViT: ALightweight Cell ClassificationMethod forMicroscope Images Based on MViTv2 and Linear Attention
DOI:
https://doi.org/10.37256/cm.7120267713Keywords:
cell classification, linear attention, quantitative methodology, kolmogorov-arnold networkAbstract
Accurate cell classification plays a vital role in the diagnosis and treatment of diseases. However, existing methods face challenges such as limited feature learning and excessive computational complexity, resulting in low classification accuracy, prolonged training processes, and slow inference speeds. We propose a novel lightweight method, Quantized Linear Vision Transformer (QLViT), based on the Multiscale Vision Transformers (MViTv2) and linear attention mechanisms, to facilitate cell classification tasks from microscope images. Specifically, QLViT employs a large-kernel convolutional layer and a well-designed feature extraction module called Conv-Linear Attention (CLA) to extract features. It optimizes self-attention with an activation function and utilizes a residual structure to facilitate feature reuse and address gradient issues. The CLA ensures efficient learning of local information via dynamic convolution and employs linear attention to comprehensively capture global features, maintaining a lightweight profile compared to the traditional self-attention. By introducing the Kolmogorov-Arnold Network (KAN) structure, CLA significantly reduces computational complexity and parameter count. Extensive experiments on four public datasets demonstrate the effectiveness of QLViT. We achieve an accuracy of 97.19% on the BioMediTech dataset, 97.35% on the ICPR-HEp-2 dataset, 90.45% on the blood malignancy bone marrow cytology expert-annotated dataset for a six-category classification task, and an impressive accuracy of 99.84% on the white blood cell dataset. Furthermore, our method exhibits a computational efficiency of 1.95 Giga Floating-point Operations (GFLOPs) and utilizes 9.07 million parameters. Our results show that QLViT outperforms current state-of-the-art methods across multiple datasets, demonstrating its superior inference speed, lightweight design, strong feature extraction capabilities and generalizability. This proposed method provides a promising solution in the field of medical image classification.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ziping Zhao, et al.

This work is licensed under a Creative Commons Attribution 4.0 International License.
