[ML] Pyramid Stereo Matching Network Paper One-page Summary

1. Abstraction

I think it is quite insightful to introduce the spatial pyramid Pooling concept to the stereo matching field. And by doing so, It achieves SOTA depth estimation result! I will cover briefly paper about the entire network structure and look more closely SSP Module and 3D CNN part because it is the main contribution of the paper.

2. Overall architecture

Aspose Words c86ae5d0-2036-43c5-be4b-1639cb4da290 002

Roughly we can divide PSMNet into three-part. Firstly, It extracts features using CNN that uses (3x3) filter gradually make 1/4 x 1/4 Feature map. Secondly, It hierarchically incorporates context information with the spatial pyramid pooling (SPP) module. And lastly, It regularizes cost volume (W x H x Disparity x Feature) by using 3D CNN that is made up of stacked multiple hourglass network. Let detail main contribution of this paper that are SPP module and 3D CNN.

3. Contribution

a) Spatial Pyramid Pooling (SPP) Module

Aspose Words c86ae5d0-2036-43c5-be4b-1639cb4da290 003

As we can see in the left figure, the SPP Module incorporates context information by using average pooling. Why it could incorporate contextual information?

Using four different sizes of average pooling, it incorporates features into 4 scales and generates output by concatenating them into 4D cost volume that has W x H x Disparity x Feature dimension. doing so, It can make utilize the various aspect of feature of image data.

Aspose Words c86ae5d0-2036-43c5-be4b-1639cb4da290 004

b) 3D CNN

In the 3D CNN part, Get a cost volume as input. Then It uses a Stacked hourglass to learn more context information to regularize cost volume to get estimated depth.

Let think about why it should be a Stacked hourglass structure? As seen in the figure, the word “hourglass” come from encoder-decoder like architecture. It learns something by compressing into an encoded dimension and expanding it into original ones. In this process, It is easier to learn more incorporated context information. At last, three-loss is used by being weighted summation.

4. Result

Aspose Words c86ae5d0-2036-43c5-be4b-1639cb4da290 005

As we expected, PSMNet gets a SOTA result with comparable runtime in KITTI 2015 dataset. It also has an end-to-end manner. With introducing, the significant idea that using SPP module and stacked hourglass structure in the stereo matching network!

All credit by Chang et al., “Pyramid Stereo Matching Network”., CVPR 2018

Tags:

Categories:

Updated:

Leave a comment