Introduction
3D reconstruction is a process of capturing the 3D geometrical structures of objects shown in 2D images, which is to determine the 3D locations of the points on the object profiles. 3D reconstruction is the core technology of various fields, including computer animation, computer-aided geometric design, medical imaging, virtual reality(VR) and augmented reality(AR), etc.
The methods of 3D reconstruction are different from single view images and multiple view images. For the monocular cues, the structured light method can be utilized to rebuild the 3D objects, like the binary coded, k-ary, intensity ratio lighting and phase-shift lighting structured light methods. For the binocular stereo, Multi-View stereo and Structure from Motion methods are most popular. In ref[1], these two approaches have been discussed. Snavely et al. conduct 3D reconstruction from the unstructured photos taken from the different cameras via SFM method [2] and create a novel approach to browse the photo collection. In this project, SFM method is implemented to conduct the 3D reconstruction.
Type | Method | Key Points |
Monocular View | Structured light Method | Using Projector |
Binocular Stereo | Multi-View Stereo | Using voxel volume |
Structure From Motion | Calculate Structure and Motion at the same time |
Table 1. Popular Methods for 3D reconstruction
My implementation consists of three parts. First part is evaluating dataset. I tested several datasets and looked for better performance on feature extraction. The dataset I used includes 36 images of a toy dinosaur from different perspectives and the camera matrices. Second, I implemented the SFM algorithm on C++ to extract depth information of features and then calculated their 3D coordinates in the world frame. The last part is using PCL library[8] to visualize the camera frames and the 3D reconstructed object. The following sections describe the main steps about the implementation details.
I. Feature detection and matching
First of all, I use the SURF algorithm from OpenCV as feature detector to extract possible matches. The surface of the object model in images is irregular and the background is monotonous, so this helped us detect many features. Then, I use the Brute-force descriptor matcher to match the features from pairs of images. Combining the “Ratio Test” associated with k nearest neighbor algorithm, I can pick up good matches. And the ratio test threshold I used here is 0.6. Finally, I got a set of 2D point correspondences.
Afterward, I discarded some of the pairs with fewer good matches. And, I stored those good pairs of images and, use them to compute fundamental matrices in the next step.
II. Compute the Fundamental Matrix 𝐹 and camera matrix
In order to exclude outliers, I used RANSAC to refine matches of paired images, and used them to compute the Fundamental Matrix and Essential Matrix as well. Having Essential Matrix means I get the information between the paired camera frames. Further, From the Essential Matrix, I used SVD factorization to extract R and t. In the end, I got all the spatial relationship between all images. Through their spatial relationship, I can calculate each transform of the camera frame to the world frame Iassigned. In this way, I had all camera frames which is the motion in SFM in the world coordinates.
III. Triangulation and initial 3D cloud points
With the intrinsic matrices provided and extrinsic matrix calculated, I can calculate the feature point in the 3D coordinates. I used the built-in function from OpenCV, triangulatePoints(), to get the matching point in the 3D space. Then, I stored all of the coordinates of matching points into a PCD format file. PCD format describes the point cloud data and I can display this file in the next step.
IV. Implement the PCL Visualizer
I implement the PCD reader and visualizer using the PCL library. PCL is designed to display point cloud data and other kinds of 3D data. I read in the PCD file created in the last step which contained the color and spatial coordinates of point cloud. I also put the camera frame in the visualizer and in that way, I can inspect the relationship between structure and motion.
2 Results and Discussions
I. Structure from Motion using the dataset from Visual Geometry Group
I choose the dinosaur dataset as the primary dataset. The dinosaur dataset contains 36 high-resolution images which are taken by the camera 360 degrees around the dinosaur model. The dinosaur has plentiful color and texture details while the background is single-color and smooth, making the feature detection and matching process relatively easy for us.
Figure 1. Dinosaur feature detection and matching results (without RANSAC)
After RANSAC to eliminate outliers, I then follow the calculation procedure to get the 3D locations for each feature points and use bundle adjustment to reduce the projection error. The final results show a pretty good reconstruction of the dinosaur and the step-by-step generated structures (3D location points) and motion(camera coordinates) are shown below.
Figure 2. SFM results and final 3D reconstructed dinosaur model.
As the results show, the camera coordinates are 360 degrees around the dinosaur model as expected and the dinosaur model is well-reconstructed despite some unwanted points extracted from the purple background. However, when zooming in the 3D model, the points become sparse so the shape and color of the model become somewhat indecipherable. Although I can get more feature points to make the 3D points denser, it is the limitation of SFM method.
II. Other Dataset
I have tried other dataset provided on the website, they all achieve a similar result as long as the model have enough feature points and separable from the background. A reconstructed 3D model of a rabbit is shown below.
Figure 3. 3D reconstructed rabbit model.
Where is the code?
As always, check out my GitHub:
https://github.com/stytim/3D_Reconstruction_SFM
If this post helped you, please consider supporting me.
