Explaining Monocular Depth Estimation - Diving into Regional Differences in the Prediction

Authors

DOI:

https://doi.org/10.5324/3kmsk121

Keywords:

Computer Vision, Monocular Depth Estimation, Deep Learning, Explainable AI

Abstract

Recent foundational depth estimation models achieve impressive accuracy on various scenes. However, due to their black box nature, we lack knowledge about how they utilize the input for their predictions, and hence about their applicability to domains sparsely covered by labeled datasets. This paper, therefore, applies occlusion to investigate the influence of the input on different parts of the predicted depth maps for underwater scenes. Our results show that foundational depth estimation models combine global and local features to estimate relative distance, and that their influence differs significantly between the background and the rest of the scene. This suggests that extending post-hoc explanations to consider relationships between multiple input and output features can enrich our understanding of monocular depth estimation models and potentially help to gauge their applicability to new domains.

Downloads

Download data is not yet available.

Downloads

Published

2025-11-24

How to Cite

[1]
“Explaining Monocular Depth Estimation - Diving into Regional Differences in the Prediction”, NIKT, vol. 37, no. 1, Nov. 2025, doi: 10.5324/3kmsk121.