Related Papers

Learning Visual-Semantic Hierarchical Attribute Space for Interpretable Open-Set Recognition

Zhuo Xu, Xiang Xiang

WACV-2025broader adjacent4

open-set

PDF

SAND: Enhancing Open-Set Neuron Descriptions through Spatial Awareness

Anvita Agarwal Srinivas, Tuomas Oikarinen, Divyansh Srivastava +2

WACV-2025broader adjacent4

open-set

CLIP

PDF

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

Yanning Hou, Peiyuan Li, Zirui Liu +4

Zero-shot anomaly detection (ZSAD) requires detecting and localizing anomalies without access to target-class anomaly samples. Mainstream methods rely on vision-language models (VLMs) such as CLIP: they build hand-crafted or learned prompt sets for normal and abnormal semantics, then compute image-text similarities for open-set discrimination. While effective, this paradigm depends on a text encoder and cross-modal alignment, which can lead to training instability and parameter redundancy. This work revisits the necessity of the text branch in ZSAD and presents VisualAD, a purely visual framew

CVPR-2026direct anomaly20

anomaly detectionopen-setarxivcvpr 2026

CLIPVision TransformerZero-shot

PDFarXiv

Related Papers

Learning Visual-Semantic Hierarchical Attribute Space for Interpretable Open-Set Recognition

SAND: Enhancing Open-Set Neuron Descriptions through Spatial Awareness

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection

Multi-subject Open-set Personalization in Video Generation

CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning

Related Papers

Learning Visual-Semantic Hierarchical Attribute Space for Interpretable Open-Set Recognition

SAND: Enhancing Open-Set Neuron Descriptions through Spatial Awareness

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection

Multi-subject Open-set Personalization in Video Generation

CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning