publications
You can also find my articles on my Google Scholar profile.
2024
- DrivingVQA: Benchmarking Visual Chain-of-Thought Reasoning with Driving Theory TestsCharles Corbière*, Simon Roburin*, Syrielle Montariol*, Antoine Bosselut, and Alexandre Alahi2024Under submission at CVPR 2025
Modern Vision-Language Models (VLMs) equip Large Language Models with image understanding, enabling multimodal reasoning. Due to the domain gap between vision and textual data, these systems often face significant challenges: an over-reliance on text priors, hallucinations, and difficulty with complex visual reasoning. While recent research efforts attempt to study VLMs’ ability to perform visual reasoning, existing benchmarks typically rely on synthetic images with human explanations or real-world images paired with generated explanations. To pave the way towards complex visual reasoning in real-world scenarios, we introduce DrivingVQA, a new benchmark based on driving theory tests. It offers approximately 4K expert-crafted multiple-choice questions along with multi-modal explanations centered on relevant entities crucial to the reasoning process. Through extensive experimentation, we explore diverse strategies to effectively leverage relevant entities to enhance visual reasoning and identify a bottleneck in handling localization information in current VLMs architecture.
- Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth EstimationMehdi Zayene, Jannik Endres, Albias Havolli, Charles Corbière, Salim Cherkaoui, Alexandre Kontouli, and Alexandre Alahi2024Under submission at CVPR 2025
Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce HELVIPAD, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360° cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.
2022
- Take One Gram of Neural Features, Get Enhanced Group RobustnessSimon Roburin*, Charles Corbière*, Gilles Puy, Nicolas Thome, Mathieu Aubry, Renaud Marlet, and Patrick PérezIn ECCV Workshop on Out-Of-Distribution Generalization in Computer Vision, 2022
Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. In particular, the presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations in test sets. Extensive attempts have been made to develop methods improving worst-group robustness. However, they require group information for each training input or at least, a validation set with group labels to tune their hyperparameters, which may be expensive to get or unknown a priori. In this paper, we address the challenge of improving group robustness without group annotations during training. To this end, we propose to partition automatically the training dataset into groups based on Gram matrices of features extracted from an identification model and to apply robust optimization based on these pseudo-groups. In the realistic context where no group labels are available, our experiments show that our approach not only improves group robustness over ERM but also outperforms all recent baselines.
2021
- Beyond First-Order Uncertainty Estimation with Evidential Models for Open-World RecognitionCharles Corbière, Marc Lafon, Nicolas Thome, Matthieu Cord, and Patrick PérezIn ICML Workshop on Uncertainty and Robustness in Deep Learning, 2021
In this paper, we tackle the challenge of jointly quantifying in-distribution and out-of-distribution (OOD) uncertainties. We introduce KLoS, a KL-divergence measure defined on the class-probability simplex. By leveraging the second-order uncertainty representation provided by evidential models, KLoS captures more than existing first-order uncertainty measures such as predictive entropy. We design an auxiliary neural network, KLoSNet, to learn a refined measure directly aligned with the evidential training objective. Experiments show that KLoSNet acts as a class-wise density estimator and outperforms current uncertainty measures in the realistic context where no OOD data is available during training. We also report comparisons in the presence of OOD training samples, which shed a new light on the impact of the vicinity of this data with OOD test data.
- Confidence Estimation via Auxiliary ModelsCharles Corbière, Nicolas Thome, Antoine Saporta, Tuan-HUng Vu, Mathieu Cord, and Patrick PérezIn TPAMI, 2021
Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context. We evaluate our approach on the task of failure prediction and of self-training with pseudo-labels for domain adaptation, which both necessitate effective confidence estimates. Extensive experiments are conducted for validating the relevance of the proposed approach in each task. We study various network architectures and experiment with small and large datasets for image classification and semantic segmentation. In every tested benchmark, our approach outperforms strong baselines.
2019
- Addressing Failure Prediction by Learning Model ConfidenceCharles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, and Patrick PérezIn NeurIPS, 2019
Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.
2017
- Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label PredictionCharles Corbière, Hedi Ben-Younes, Alexandre Rame, and Charles OllionIn ICCV Fashion Workshop, 2017
In this paper, we present a method to learn a visual representation adapted for e-commerce products. Based on weakly supervised learning, our model learns from noisy datasets crawled on e-commerce website catalogs and does not require any manual labeling. We show that our representation can be used for downward classification tasks over clothing categories with different levels of granularity. We also demonstrate that the learnt representation is suitable for image retrieval. We achieve nearly state-of-art results on the DeepFashion In-Shop Clothes Retrieval and Categories Attributes Prediction tasks, without using the provided training set.