To facilitate organizing and reading the papers, I will compile a list of papers related to 3D object detection. This will cover deep learning-based algorithms and multimodal fusion algorithms.
(It's mainly because my PhD supervisor told me to organize it; otherwise, I'd be too lazy to do it, haha.)
- Survey
- object detection without fusion
- multimodel object detection
- Selfsupervised Learning
- Unsupervised Learning
- DownSampling in pointcloud
- Point Cloud Local Feature Description
- Cooperative Driving Automation
- DataSet
- Collaborative DataSet
Method | Title | Author |
---|---|---|
object detection | Foreground-Background Imbalance Problem in Deep Object Detectors: A Review | Joya Chen, Tong Xu |
object detection | A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving | Di Feng,Ali Harakeh,Steven Waslander |
object detection | An Overview Of 3D Object Detection | Yilin Wang, Jiayi Ye |
object detection | 3D Object Detection for Autonomous Driving: A Survey | Rui Qian, Xin Lai |
MultiModel | Multi-Modal 3D Object Detection in Autonomous Driving: a Survey | Yingjie Wang,Qiuyu Mao |
MultiModel | Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets,Methods, and Challenges | Di Feng,Christian Haase-Schutz |
MultiModel | Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review | Yaodong Cui |
Title | Pub. | Author |
---|---|---|
Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | 2022 | MetaAI |
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding | CVPR 2022 | Mohamed Afham |
Title | Pub. | Author |
---|---|---|
Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints | Remote Sensing 2021 | Jin et al. |
ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection | CVPR 2021 | Yang et al. |
Title | Pub. | Author |
---|---|---|
2D Shape Context: Shape Context: A new descriptor for shape matching and object recognition | NeurIPS 2000 | Serge Belongie et al. |
3D Shape Context:Recognizing Objects in Range Data Using Regional Point Descriptors | ECCV 2004 | Andrea et al. |
Shape Matching and Object Recognition Using Shape Contexts | 2002 | Belongie et al. |
3D Shape Descriptor for Objects Recognition | LARS and SBR 2017 | Sales et al. |
ROI-cloud: A Key Region Extraction Method for LiDAR Odometry and Localization | ICRA 2020 | Zhou et al. |
PointSIFT: A sift-like network module for 3D point cloud semantic segmentation | CVPR 2018 | Jiang et al. |
DataSet | Size | Categories / Remarks | Sensing Modalities |
---|---|---|---|
ScanNet | 1513 scans 2.5M frames | floor, wall, chair, cabinet, bed, sofa, table, door, window, bookself, picture, counter, desk, curtain, refrigerator, shower curtain, toilet, sink, bathtub, other furniture | 3D comera,deep Sensors |
SUN RGB-D | |||
SUN3D | |||
KITTI | 7481 frames (training) 80.256 objects | Car, Van, Truck, Pedestrian, Person (sitting), Cyclist, Tram,Misc | Visual (Stereo) camera, 3D LiDAR, GNSS, and inertial sensors |
nuScense | 1000 scenes, 1.4M frames (camera, Radar), 390k frames (3D LiDAR) | 25 Object classes, such as Car /Van / SUV, different Trucks,Buses, Persons, Animal, Traffic Cone, Temporary Traffic Barrier, Debris, etc. | Visual cameras (6), 3D LiDAR, and Radars (5) |
BLVD | 120k frames, 249,129 objects | Vehicle, Pedestrian, Rider during day and night | Visual (Stereo) camera, 3D LiDAR |
Waymo open dataset | 200k frames, 12M objects (3D LiDAR), 1.2M objects (2D camera) | Vehicles, Pedestrians, Cyclists,Signs | 3D LiDAR (5), Visual cameras (5) |
H3D | 27,721 frames, 1,071,302 objects | Car, Pedestrian, Cyclist, Truck, Misc, Animals, Motorcyclist, Bus | Visual cameras (3), 3D LiDAR |
Lyft-L5 AV dataset | 55k frames | Semantic HD map included | 3D LiDAR (5), Visual cameras (6) |
A2D2 | 40k frames (semantics), 12k frames (3D objects), 390k frames unlabeled | Car,Bicycle, Pedestrian, Truck,Small vehicles, Traffic signal,Utility vehicle, Sidebars, Speed bumper, Curbstone, Solid line,Irrelevant signs, Road blocks, Tractor, Non-drivable street, Zebra crossing, Obstacles / trash, Poles,RD restricted area, Animals, Grid structure, Signal corpus, Drivable cobbleston, Electronic traffic,Slow drive area, Nature object,Parking area, Sidewalk, Ego car,Painted driv. instr., Traffic guide obj., Dashed line, RD normal street, Sky, Buildings, Blurred area, Rain dirt | Visual cameras (6); 3D LiDAR (5); Bus data |
ApolloScape | 143,906 image frames, 89,430 objects | Rover, Sky, Car, Motobicycle,Bicycle, Person, Rider, Truck,Bus, Tricycle, Road, Sidewalk,Traffic Cone, Road Pile, Fence,Traffic Light, Pole, Traffic Sign,Wall, Dustbin, Billboard,Building, Bridge, Tunnel,Overpass, Vegetation | Visual (Stereo) camera, 3D LiDAR, GNSS, and inertial sensors |
A3D Dataset | 39k frames, 230k objects | Car, Van, Bus, Truck, Pedestrians,Cyclists, and Motorcyclists;Afternoon and night, wet and dry | Visual cameras (2); 3D LiDAR |
DBNet Dataset | Over 10k frames | In total seven datasets with different test scenarios, such as seaside roads, school areas,mountain roads. | 3D LiDAR, Dashboard visual camera, GNSS |
KAIST multispectral dataset | 7,512 frames, 308,913 objects | Person, Cyclist, Car during day and night, fine time slots (sunrise,afternoon,...) | |
PandaSet |
DataSet | Simulation |
---|---|
OPV2V | Yes |
V2V4Real | No |
V2XSet | Yes |
V2X-Sim | Yes |
[DAIR-V2X] | No |