SRE-Ret: An Object Detection Method Based on Sparse Region Extraction

Authors

  • Yanming Ye School of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018, China https://orcid.org/0009-0001-8461-9956
  • Kailong Cheng School of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018, China
  • Qiang Sun School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
  • Xingfa Shen School of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018, China
  • Xinyi Chang School of Data Science and Intelligent Media, Communication University of China, Beijing, 100024, China

DOI:

https://doi.org/10.37256/cm.6520257227

Keywords:

object detection, transformer, attention mechanism, sparse region extraction, small object detection, embedded devices

Abstract

The continuous advancement of image acquisition technology and the subsequent proliferation of high-resolution images have introduced significant challenges to conventional object detection methodologies. While high-resolution feature maps offer a distinct advantage in detecting small objects due to their retention of detailed information, the concomitant increase in candidate regions and computational complexity substantially impedes real-time performance. Conversely, low-resolution feature maps, although computationally efficient, often lack the necessary precision for effective small object detection, failing to satisfy practical application demands. Consequently, optimizing the allocation of computational resources within high-resolution feature maps while preserving the accuracy of small object detection has emerged as a critical focus and ongoing challenge in contemporary research. To address these limitations, this paper introduces an object detection method based on Sparse Region Extraction (SRE), termed SRE-Ret. This method leverages the window-based and shifted-window self-attention mechanisms inherent in the Swin-Transformer architecture. By employing sparse region selection on high-resolution feature layers, it selectively filters feature windows likely to contain objects, thereby substantially reducing the number of candidate regions and redundant computations. Furthermore, a dedicated small object detection head is integrated into the high-resolution feature layers for precise prediction, while an efficient convolutional detection head is utilized on the low-resolution feature layers for rapid inference. The novelty of this approach lies in achieving sparse processing of feature regions via the SRE module, effectively balancing precision and efficiency in multi-scale feature detection.

Downloads

Published

2025-10-10