Resumen |
Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)—segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D–3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained σ3 = 0.95, which is an improvement of 0.14 points (compared with the state of the art of σ3 = 0.81) by using manual segmentation, and σ3 = 0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known. © 2022 by the authors. Licensee MDPI, Basel, Switzerland. |