3D Reconstructions From Multiple Images Using SFM


This paper represents the three-dimensional(3D) reconstruction of the object from multiple images using structure from motion(SFM) method. First, we present the different popular methods for 3D reconstruction. Second, the SFM method is illustrated in details. Third, we reconstruct a stereoscopic ‘dinosaur’ from multiple images by means of SFM. After that, we conduct bundle adjustment to decrease the reprojection error. Fourth, the performance of our 3D reconstruction is estimated. In the end, the challenges and future of 3D reconstruction technology.


3D reconstruction is a process of capturing the 3D geometrical structures of objects shown in 2D images, which is to determine the 3D locations of the points on the object profiles. 3D reconstruction is the core technology of various fields, including computer animation, computer-aided geometric design, medical imaging, virtual reality(VR) and augmented reality(AR), etc.

The methods of 3D reconstruction are different from single view images and multiple view images. For the monocular cues, the structured light method can be utilized to rebuild the 3D objects, like the binary coded, k-ary, intensity ratio lighting and phase-shift lighting structured light methods. For the binocular stereo, Multi-View stereo and Structure from Motion methods are most popular. In ref[1], these two approaches have been discussed. Snavely et al. conduct 3D reconstruction from the unstructured photos taken from the different cameras via SFM method [2] and create a novel approach to browse the photo collection. In our project, SFM method is implemented to conduct the 3D reconstruction.

Type Method Key Points
Monocular View Structured light Method Using Projector
Binocular Stereo Multi-View Stereo Using voxel volume
Structure From Motion Calculate Structure and Motion at same time

Table 1. Popular Methods for 3D reconstruction

Our implementation consists of three parts. First part is evaluating dataset. We tested several datasets and looked for better performance on feature extraction. The dataset we used includes 36 images of a toy dinosaur from different perspectives and the camera matrices. Second, we implemented the SFM algorithm on C++ to extract depth information of features and then calculated their 3D coordinates in world frame. The last part is using PCL library[8] to visualize the camera frames and the 3D reconstructed object. The following sections describe the main steps about the implementation details of our project.

I. Feature detection and matching

First of all, we use the SURF algorithm from OpenCV as feature detector to extract possible matches. The surface of the object model in images is irregular and the background is monotonous, so this helped us detect many features. Then, we use the Brute-force descriptor matcher to match the features from pairs of images. Combining the “Ratio Test” associated with k nearest neighbor algorithm, we can pick up good matches. And the ratio test threshold we used here is 0.6. Finally, we got a set of 2D point correspondences.

Afterward, we discarded some of the pairs with fewer good matches. And, we stored those good pairs of images and, use them to compute fundamental matrices in the next step.

II. Compute the Fundamental Matrix 𝐹 and camera matrix

In order to exclude outliers, we used RANSAC to refine matches of paired images, and used them to compute the Fundamental Matrix and Essential Matrix as well. Having Essential Matrix means we get the information between the paired camera frames. Further, From the Essential Matrix, we used SVD factorization to extract R and t. In the end, we got all the spatial relationship between all images. Through their spatial relationship, we can calculate each transform of the camera frame to the world frame we assigned. In this way, we had all camera frames which is the motion in SFM in the world coordinates.

III. Triangulation and initial 3D cloud points

With the intrinsic matrices provided and extrinsic matrix we calculated, we can calculate the feature point in the 3D coordinates. We used the built-in function from OpenCV, triangulatePoints(), to get the matching point in the 3D space. Then, we stored all of the coordinates of matching points into a PCD format file. PCD format describes the point cloud data and we can display this file in the next step.

IV. Implement the PCL Visualizer

We implement the PCD reader and visualizer using the PCL library. PCL is designed to display point cloud data and other kinds of 3D data. We read in the PCD file we created in the last step which contained the color and spatial coordinates of point cloud. We also put the camera frame in the visualizer and in that way, we can inspect the relationship between structure and motion.

2 Results and Discussions

I. Structure from Motion using the dataset from Visual Geometry Group

We choose the dinosaur dataset as our primary dataset. The dinosaur dataset contains 36 high-resolution images which are taken by the camera 360 degrees around the dinosaur model. The dinosaur has plentiful color and texture details while the background is single-color and smooth, making the feature detection and matching process relatively easy for us.


Figure 1. Dinosaur feature detection and matching results (without RANSAC)

After RANSAC to eliminate outliers, we then follow the calculation procedure to get the 3D locations for each feature points and use bundle adjustment to reduce the projection error. The final results show a pretty good reconstruction of the dinosaur and the step-by-step generated structures (3D location points) and motion(camera coordinates) are shown below.


Figure 2. SFM results and final 3D reconstructed dinosaur model.

As the results show, the camera coordinates are 360 degrees around the dinosaur model as expected and the dinosaur model is well-reconstructed despite some unwanted points extracted from the purple background. However, when zooming in the 3D model, the points become sparse so the shape and color of the model become somewhat indecipherable. Although we can get more feature points to make the 3D points denser, it is the limitation of SFM method.

II. Other Dataset

4 We have tried other dataset provided on the website, they all achieve a similar result as long as the model have enough feature points and separable from the background. A reconstructed 3D model of a rabbit is shown below.


Figure 3. 3D reconstructed rabbit model.

Future Works

Our project successfully reconstructs different 3D models using SFM algorithm. Some future works may include reconstruction of the surface of the model to make it more realistic.


[1] S. McCann. 3D Reconstruction from Multiple Images.

[2] N. Snavely, S. Seitz, R. Szeliski. Photo Tourism: Exploring Photo Collections in 3D. ACM Transactions on Graphics, 2006.

[3] Niem, Wolfgang. “Dinosaur.” Visual Geometry Group Home Page, University of Hannover, http://www.robots.ox.ac.uk/~vgg/data/data-mview.html.

[4]C. Strecha, W. von Hansen, L. Van Gool, P. Fua, U. Thoennessen. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. Computer Vision and Pattern Recognition (2008)

[5] Introduction to SURF (Speeded-Up Robust Features) — OpenCV 3.0.0-Dev Documentation,docs.opencv.org/3.0beta/doc/py_tutorials/py_feature2d/py_surf_intro/py_surf_intro.html.

[6] Jang, Woo-Seok, and Yo-Sung Ho. “3-D Object Reconstruction from Multiple 2-D Images.” 3D Research, vol. 1, no. 2, 2010, doi:10.1007/3dres.02(2010)1.  

[7] The Stanford 3D Scanning Repository, graphics.stanford.edu/data/3Dscanrep/.

[8] “PCL – Point Cloud Library (PCL).” PCL – Point Cloud Library (PCL), pointclouds.org/.


You can access this project codes in my github: https://github.com/stytim/3D_Reconstruction_SFM