- 标题:Self6D:自监督单目6D物体姿态估计
- 作者团队:清华大学 & 慕尼黑工业大学 & 谷歌
6D目标姿态估计是计算机视觉中的一个基本问题。卷积神经网络(CNN)最近被证明可以从单目获取的图像中预测物体可靠的6D姿态。尽管如此,CNN仍然被认为是高度依赖数据驱动的 (data-driven),但是,获取足够的标注通常非常耗时且劳动强度大。为克服此缺点,我们提出了通过自我监督学习进行单目6D姿态估计的想法,该想法消除了对带有标注的真实数据需求。在用合成RGB数据完全监督了我们提议的网络之后,我们利用神经渲染的最新进展对未标注的真实RGB-D数据进行进一步的自我监督,以寻求视觉和几何上的最佳对齐方式。广泛的评估表明,我们提出的自我监督能够显着提高模型的原始性能,胜过所有其他依赖于合成数据或采用领域自适应技术的方法。
- 标题:使用少量杂波图像进行6D姿势估计的神经目标学习
- 作者团队:维也纳工业大学 (TU Wien)
用于目标6D姿态估计的最新方法,比如假定纹理3D模型或覆盖目标姿态整个范围的真实图像。 但是,在真实场景中很难获得带纹理的3D模型和带有标注对象姿态的数据。 本文提出了一种方法,即神经目标学习(NOL),该方法可以通过仅合并来自混合图像的一些观察结果,以任意姿态生成目标的合成图像。 提出了一种新颖的细化步骤,以对齐源图像中对象的不准确姿态,从而获得质量更好的图像。 对两个公共数据集的评估表明,与使用真实图像数量10倍的方法相比,由NOL创建的渲染图像达到SOTA的性能。 对我们新数据集的评估表明,可以使用一系列固定场景同时训练和识别多个目标。
- 标题:先验形状变形用于分类6D对象姿态和大小估计
- 作者团队:新加坡国立大学
我们提出了一种新颖的学习方法,可以从RGB-D图像中恢复未见对象实例的6D姿态和大小。 为了处理类内形状变化,我们提出了一个深层网络,可通过对来自预先学习的分类形状的变形进行显式建模来重建3D对象模型。 此外,我们的网络可以推断对象实例的深度,用于观察与重建3D模型之间的紧密对应关系,以达到共同估算6D对象姿态和大小的目的。 我们设计了一个自动编码器,该编码器可以在一组对象模型上进行训练,并计算每个类别的平均潜在嵌入量,以学习分类先验形状。 在合成数据集和真实数据集上进行的大量实验表明,我们的方法大大优于现有技术。
Dense RepPoints: Representing Visual Objects with Dense Point Sets
Corner Proposal Network for Anchor-free, Two-stage Object Detection
BorderDet: Border Feature for Dense Object Detection
Multi-Scale Positive Sample Refinement for Few-Shot Object Detection
PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments
- 论文:https://arxiv.org/abs/2007.09584
- 代码:https://github.com/clobotics/piou
- 数据集:https://github.com/clobotics/piou
Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer
Probabilistic Anchor Assignment with IoU Prediction for Object Detection
HoughNet: Integrating near and long-range evidence for bottom-up object detection
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
End-to-End Object Detection with Transformers
- 论文:https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers
- 代码:https://github.com/facebookresearch/detr
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training
Arbitrary-Oriented Object Detection with Circular Smooth Label
Pillar-based Object Detection for Autonomous Driving
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
Learning Where to Focus for Efficient Video Object Detection
Tensor Low-Rank Reconstruction for Semantic Segmentation
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
- 论文:https://arxiv.org/abs/2007.09183
- 代码:https://github.com/charlesCXK/RGBD_Semantic_Segmentation_PyTorch
GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild
- 主页:https://lttm.dei.unipd.it/paper_data/GMNet/
- 论文:https://arxiv.org/abs/2007.09073
- 代码:https://github.com/LTTM/GMNet
SegFix: Model-Agnostic Boundary Refinement for Segmentation
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation(Oral)
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation
Boundary-preserving Mask R-CNN
Conditional Convolutions for Instance Segmentation (Oral)
- 论文:https://arxiv.org/abs/2003.05664
- 代码:https://github.com/aim-uofa/AdelaiDet/blob/master/configs/CondInst/README.md
SOLO: Segmenting Objects by Locations
Collaborative Video Object Segmentation by Foreground-Background Integration
Video Object Segmentation with Episodic Graph Memory Networks
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations
Edge-aware Graph Representation Learning and Reasoning for Face Parsing
Ocean: Object-aware Anchor-Free Tracking
Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking
Ocean: Object-aware Anchor-Free Tracking
TAO: A Large-Scale Benchmark for Tracking Any Object
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation(Oral)
- 论文:https://arxiv.org/abs/2007.01550
- 代码:https://github.com/detectRecog/PointTrack
- 数据集:https://github.com/detectRecog/PointTrack
AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds
A Closer Look at Local Aggregation Operators in Point Cloud Analysis
Multimodal Shape Completion via Conditional Generative Adversarial Networks
GRNet: Gridding Residual Network for Dense Point Cloud Completion
Progressive Point Cloud Deconvolution Generation Network
Learning to Learn Parameterized Classification Networks for Scalable Input Images
Learning To Classify Images Without Labels
Context-Aware RCNN: A Baseline for Action Detection in Videos
Actions as Moving Points
SF-Net: Single-Frame Supervision for Temporal Action Localization
Asynchronous Interaction Aggregation for Action Detection
Rewriting a Deep Generative Model
Contrastive Learning for Unpaired Image-to-Image Translation
XingGAN for Person Image Generation
Temporal Complementary Learning for Video Person Re-Identification
论文:https://arxiv.org/abs/2007.09357 代码:https://github.com/blue-blue272/VideoReID-TCLNet
Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification
Robust Re-Identification by Multiple Views Knowledge Distillation
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification
Quaternion Equivariant Capsule Networks for 3D Point Clouds Oral
DeepFit: 3D Surface Fitting by Neural Network Weighted Least Squares Oral
MoSaNAS: Multi-Objective Surrogate-Assisted Neural Architecture Search Oral
Describing Textures using Natural Language Oral
Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition Oral
AiR: Attention with Reasoning Capability Oral
Self6D: Self-Supervised Monocular 6D Object Pose Estimation Oral
Invertible Image Rescaling Oral
Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation Oral
House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation Oral
Crowdsampling the Plenoptic Function Oral
End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras Oral
End-to-End Object Detection with Transformers Oral
DeepSFM: Structure From Motion Via Deep Bundle Adjustment Oral
Ladybird: Deep Implicit Field Based 3D Reconstruction with Sampling and Symmetry Oral
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation Oral
Conditional Convolutions for Instance Segmentation Oral
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution Oral
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Oral
Privacy Preserving Structure-from-Motion Oral
Rewriting a Deep Generative Model Oral
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets Oral
Long-term Human Motion Prediction with Scene Context Oral
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Oral
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes Oral
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images Oral
Learning and aggregating deep local descriptors for instance-level recognition Oral
A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem Oral
Learn to Recover Visible Color for Video Surveillance in a Day Oral
Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single-view Images Oral
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation Oral
BorderDet: Border Feature for Dense Object Detection Oral
Regularization with Latent Space Virtual Adversarial Training Oral
Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels Oral
Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning Oral
Targeted Attack for Deep Hashing based Retrieval Oral
Gradient Centralization: A New Optimization Technique for Deep Neural Networks Oral
Content-Aware Unsupervised Deep Homography Estimation Oral
Multi-View Optimization of Local Feature Geometry Oral
Efficient Model Fitting by Combining Lifted Optimization with Phong Surface Models Oral
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video Oral
Learning Stereo from Single Images Oral
Prototype Rectification for Few-Shot Learning Oral
Learning Feature Descriptors using Camera Pose Supervision Oral
Semantic Flow for Fast and Accurate Scene Parsing Oral
Appearance Consensus Driven Self-Supervised Human Mesh Recovery Oral
Diffraction Line Imaging Oral
Aligning and Projecting Images to Class-conditional Generative Networks Oral
Suppress and Balance: A Simple Gated Network for Salient Object Detection Oral
Visual Memorability for Robotic Interestingness Prediction via Unsupervised Online Learning Oral
Post-Training Piecewise Linear Quantization for Deep Neural Networks Oral
Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification Oral
In-Home Daily-Life Captioning Using Radio Signals Oral
Self-Challenging Improves Cross-Domain Generalization Oral
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering Oral
Multi-task Learning Increases Adversarial Robustness Oral
S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search Oral
Improving Deep Video Compression by Resolution-adaptive Flow Coding Oral
Motion Capture from Internet Videos Oral
Appearance-Preserving 3D Convolution for Video-based Person Re-identification Oral
Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization Oral
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation Oral
Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures Oral
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling Oral
Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction Oral
Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network Oral
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation Oral
Coherent full scene 3D reconstruction from a single RGB image Oral
Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs Oral
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow Oral
Domain-invariant Stereo Matching Networks Oral
DeepHandMesh: Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling from a Single RGB Image Oral
Content Adaptive and Error Propagation Aware Deep Video Compression Oral
Towards Streaming Image Understanding Oral
Towards Automated Testing and Robustification by Semantic Adversarial Data Generation Oral
Adversarial Generative Grammars for Human Activity Prediction Oral
Greedy Sampler and Dumb Learner: A Surprisingly Effective Approach for Continual Learning Oral
Learning Lane Graph Representations for Motion Forecasting Oral
What Matters in Unsupervised Optical Flow Oral
Synthesis and Completion of Facades from Satellite Imagery Oral
Mapillary Planet-Scale Depth Dataset Oral
V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction Oral
Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters Oral
EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning Oral
Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation Oral
Cross-Domain Cascaded Deep Translation Oral
"Look Ma, no landmarks!" - Unsupervised, model-based dense face alignment Oral
Online Invariance Selection for Local Feature Descriptors Oral
Rethinking image inpainting via a mutual encoder-decoder with feature equalization Oral
TextCaps: a Dataset for Image Captioning with Reading Comprehension Oral
It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction Oral
Learning What to Learn for Video Object Segmentation Oral
SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing Oral
LIMP: Learning Latent Shape Representations with Metric Preservation Priors Oral
Unsupervised Sketch-to-Photo Synthesis Oral
A simple way to make neural networks robust against diverse image corruptions Oral
SoftpoolNet: Shape Descriptor for Point Cloud Completion and Classification Oral
Hierarchical Face Aging through Disentangled Latent Characteristics Oral
Hybrid Models for Open Set Recognition Oral
TopoGAN: A Topology-Aware Generative Adversarial Network Oral
Learning to Localize Actions from Moments Oral
ForkGAN: Seeing into the Rainy Night Oral
TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning Oral
ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval Oral