GCI-ViTAL: Gradual Confidence Improvement with Vision  Transformers for Active Learning on Label Noise

Moseli Mots'oehli; Kyungim Baek

doi:10.37256/ccds.6220257319

Authors

Moseli Mots'oehli Department of Information and Computer Sciences, University of Hawai'i at Manoa, Honolulu, HI, 96822, United States https://orcid.org/0000-0002-9191-0565
Kyungim Baek Department of Information and Computer Sciences, University of Hawai'i at Manoa, Honolulu, HI, 96822, United States

DOI:

https://doi.org/10.37256/ccds.6220257319

Keywords:

deep active learning, vision transformer, label noise, image classification

Abstract

Active Learning (AL) aims to train accurate classifiers while minimizing labeling costs by strategically selecting informative samples for annotation. This study focuses on image classification tasks, comparing AL methods on the CIFAR10, CIFAR100, Food101, and Chest X-ray datasets under varying label noise rates. We investigate the impact of the model architecture by comparing Convolutional Neural Networks (CNNs) and Vision Transformer (ViT)-based models. We propose a novel deep AL algorithm, Gradual Confidence Improvement with Vision Transformers for Active Learning (GCI-ViTAL), designed to be robust to label noise. GCI-ViTAL utilizes prediction entropy and the Frobenius norm of last-layer attention vectors compared to class-centric clean set attention vectors. Our method identifies uncertain and semantically divergent samples from typical images in their assigned class. This allows GCI-ViTAL to select informative data points even in the presence of label noise while flagging potentially mislabeled candidates. Label smoothing is applied to train a model that is not overly confident about potentially noisy labels. We evaluate GCI-ViTAL under varying levels of symmetric label noise and compare it to five other AL strategies. Our results show that using ViTs leads to better performance over CNNs across all AL strategies, particularly in noisy label settings. Additionally, using the semantic information of images as label grounding leads to a more robust model under label noise. Notably, we skip extensive hyperparameter tuning, providing an out-of-the-box comparison that helps practitioners select AL models and strategies without an exhaustive literature review on real-world vision model tuning.

GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License