Preface
3D object detection is a core task for autonomous driving. This article summarizes a recent survey on image-based 3D object detection.
Abstract
Image-based 3D object detection is a fundamental and challenging problem in autonomous driving that has received growing attention from industry and academia. Thanks to rapid advances in deep learning, image-based 3D detection has made notable progress. In particular, from 2015 to 2021 more than 200 works addressed this problem, covering a wide range of theory, algorithms, and applications. However, until recently there had been no up-to-date survey compiling and organizing this body of work. The reviewed paper fills this gap by providing the first comprehensive survey of this emerging and fast-growing research area. It summarizes the most commonly used image-based 3D detection pipeline and analyzes each component in depth. The authors also propose two new taxonomies to organize state-of-the-art methods into different categories, aiming to enable a more systematic review and fair comparison for future work. While reviewing past achievements, the survey analyzes current challenges and discusses future directions for image-based 3D detection research.
Datasets
Summary of datasets available for image-based 3D object detection in autonomous driving scenarios. Some datasets include multiple tasks; here only the 3D detection benchmarks are reported. For example, KITTI 3D released over 40K images, of which about 15K are used for 3D detection.
Taxonomies
To facilitate systematic analysis of existing methods and fair performance comparison in future work, the authors propose two taxonomies that classify existing methods according to the adopted framework and the input data used.
The paper provides an overview of the image-based 3D detection pipeline. The data flows of data-level augmentation, feature-level augmentation, and result-level augmentation are indicated by blue, green, and red arrows, respectively.
The survey lists the most relevant image-based 3D detection methods and benchmarks over time.
The authors report the auxiliary data requirements and results for image-based 3D detectors published in major conferences and journals. Results are reported on the KITTI test set for the Car category using 3D / BEV AP | R40. Methods are grouped by the auxiliary data they use. Within each group, methods are sorted by 3D AP | R40 under the moderate setting. The table also indicates whether methods used pretrained weights, excluding ImageNet pretraining.
Future Directions
The authors identify two main challenges for future 3D object detection:
- Perceiving a 3D world from 2D images is an ill-posed problem.
- Annotating 3D data is extremely expensive, and existing datasets remain limited in scale.
To address the first challenge, the community can explore improved depth estimation methods and exploit multimodal and temporal information. For the second challenge, approaches beyond fully supervised learning should be considered. Transfer learning from large-scale pretrained models is another promising direction. Finally, given the application context of 3D detection, model generalization should be an explicit consideration.
Conclusion
The reviewed paper provides a comprehensive survey of recent advances in image-based 3D detection for autonomous driving. The authors categorize existing methods, present detailed comparisons, and discuss each necessary component of 3D detection, including feature extraction, loss modeling, and post-processing. The paper also examines the role of auxiliary data in the field and highlights open challenges and potential research directions.