Thesis Defense: Algorithms for Single-View Depth Image Estimation

Friday, May 10, 2019 - 3:00pm to Saturday, May 11, 2019 - 3:55pm

Event Calendar Category

LIDS Thesis Defense

Speaker Name

Fangchang Ma

Affiliation

LIDS

Building and Room number

32-G449 (Kiva Conference Room)

Thesis Supervisor: Sertac Karaman

Depth sensing is fundamental in autonomous navigation, localization, and mapping. However, existing depth sensors offer many shortcomings, especially low effective spatial resolutions. In order to attain enhanced resolution with existing hardware, this dissertation studies the single-view depth estimation problem – the goal is to reconstruct the dense and complete 3D structures of the scene, given only sparse depth measurements. To this end, this thesis proposes three different algorithms for depth estimation.

The first contribution is an algorithm for efficient reconstruction of 3D planar surfaces. This algorithm assumes that the 3D structure is piecewise-planar, and thus the second-order derivatives of the depth image are sparse. We develop a linear programming problem for recovery of the 3D surfaces under such assumptions and provide conditions under which the reconstruction is exact. This method requires no learning, but still outperforms deep-learning-based methods under certain conditions.

The second contribution is a deep regression network and a self-supervised learning framework. We formulate the depth completion problem as a pixel-level regression problem and solve it by training a neural network. Additionally, to address the difficulty in gathering ground truth annotations for depth data, we develop a self-supervised framework that trains the regression network by enforcing temporal photometric consistency, using only raw RGB and sparse depth data. The supervised method achieves state-of-the-art accuracy, and the self-supervised approach attains a lower but comparable accuracy.

Our third contribution is a two-stage algorithm for a broad class of inverse problems (e.g., depth completion and image inpainting). We assume that the target image is the output of a generative neural network, and only a subset of the output pixels is observed. The goal is to reconstruct the unseen pixels based on the partial samples. Our proposed algorithm first recovers the corresponding low-dimensional input latent vector using simple gradient descents and then reconstructs the entire output with a single forward pass. We provide conditions under which the proposed algorithm achieves perfect reconstruction, and empirically demonstrate the effectiveness of such algorithms on real data.