Explaining Monocular Depth Estimation - Diving into Regional Differences in the Prediction
DOI:
https://doi.org/10.5324/3kmsk121Keywords:
Computer Vision, Monocular Depth Estimation, Deep Learning, Explainable AIAbstract
Recent foundational depth estimation models achieve impressive accuracy on various scenes. However, due to their black box nature, we lack knowledge about how they utilize the input for their predictions, and hence about their applicability to domains sparsely covered by labeled datasets. This paper, therefore, applies occlusion to investigate the influence of the input on different parts of the predicted depth maps for underwater scenes. Our results show that foundational depth estimation models combine global and local features to estimate relative distance, and that their influence differs significantly between the background and the rest of the scene. This suggests that extending post-hoc explanations to consider relationships between multiple input and output features can enrich our understanding of monocular depth estimation models and potentially help to gauge their applicability to new domains.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Sabine Fischer, Øyvind Ødegård, Frank Lindseth, Steven Yves Le Moan, Gabriel Hanssen Kiss

This work is licensed under a Creative Commons Attribution 4.0 International License.