Empirical Study on Performance–Perception Discrepancy in RGB–Thermal Monocular Depth Estimation under Varying Illumination
Keywords:
Depth Estimation, Multimodal, Illumination Robustness, Quantitative Evaluation, Visual ConsistencyAbstract
With the advancement of multimodal perception technologies, integrating visible (RGB) and thermal infrared (THR) information has become a key approach to enhancing the robustness of visual systems under complex illumination conditions. While existing studies primarily focus on improving quantitative accuracy through multimodal fusion, less attention has been paid to the perceptual differences and consistency between modalities. This study investigates the performance–perception discrepancy in multimodal depth estimation under varying illumination scenarios. Through comparative experiments between RGB and THR modalities, the analysis reveals that THR exhibits superior numerical performance (e.g., lower RMSE and AbsRel) in low-light and nighttime conditions, yet suffers from perceptual degradation such as over-smoothing and structural blurring. Moreover, by referencing findings in multimodal object detection, this phenomenon is shown to be task-general, arising from the distinct spatial frequency responses of different modalities. The presented results provide empirical evidence and theoretical insight for future research on multimodal feature fusion and perceptual consistency optimizatio
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Mingji Kang

This work is licensed under a Creative Commons Attribution 4.0 International License.