Skip to content

Latest commit

 

History

History
423 lines (358 loc) · 59.7 KB

File metadata and controls

423 lines (358 loc) · 59.7 KB

⚔🛡 Awesome Backdoor Attacks and Defenses

This repository contains a collection of papers and resources on backdoor attacks and backdoor defense in deep learning.

Table of contents

📃Survey

Year Publication Paper
2023 arXiv Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example
2022 TPAMI Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
2022 TNNLS Backdoor Learning: A Survey
2022 IEEE Wireless Communications Backdoor Attacks and Defenses in Federated Learning: State-of-the-art, Taxonomy, and Future Directions
2021 Neurocomputing Defense against Neural Trojan Attacks: A Survey
2020 ISQED A Survey on Neural Trojans

Tutorial & Workshop

Venue Title
ICCV 2023 Backdoor Learning: Recent Advances and Future Trends
NeurIPS 2023 Backdoors in Deep Learning

⚔Backdoor Attacks

Supervised learning (Image classification)

Year Publication Paper Code
2023 NeurIPS 2023 Label Poisoning is All You Need
2023 ICCV 2023 Computation and Data Efficient Backdoor Attacks
2023 arXiv Boosting backdoor attack with a learnable poisoning sample selection strategy
2023 arXiv Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers
2023 CVPR 2023 Architectural Backdoors in Neural Networks
2023 CVPR 2023 Color Backdoor: A Robust Poisoning Attack in Color Space
2023 CVPR 2023 You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?
2023 CVPR 2023 The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection :octocat:
2023 CVPR 2023 Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
2023 ICLR 2023 Revisiting the Assumption of Latent Separability for Backdoor Defenses :octocat:
2023 ICLR 2023 Few-shot Backdoor Attacks via Neural Tangent Kernels
2023 ICLR 2023 The Dark Side of AutoML: Towards Architectural Backdoor Search :octocat:
2023 ICLR 2023 Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only
2022 AAAI 2022 Backdoor Attacks on the DNN Interpretation System
2022 AAAI 2022 Faster Algorithms for Weak Backdoors
2022 AAAI 2022 Finding Backdoors to Integer Programs: A Monte Carlo Tree Search Framework
2022 AAAI 2022 Hibernated Backdoor: A Mutual Information Empowered Backdoor Attack to Deep Neural Networks
2022 AAAI 2022 On Probabilistic Generalization of Backdoors in Boolean Satisfiability
2022 CCS 2022 Backdoor Attacks on Spiking NNs and Neuromorphic Datasets :octocat:
2022 CCS 2022 LoneNeuron: A Highly-Effective Feature-Domain Neural Trojan Using Invisible and Polymorphic Watermarks
2022 CVPR 2022 BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning
2022 CVPR 2022 DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints
2022 CVPR 2022 FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis :octocat:
2022 CVPR 2022 Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks.
2022 ECCV 2022 An Invisible Black-Box Backdoor Attack Through Frequency Domain :octocat:
2022 ECCV 2022 RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN :octocat:
2022 EUROSP 2022 Dynamic Backdoor Attacks Against Machine Learning Models
2022 ICASSP 2022 Invisible and Efficient Backdoor Attacks for Compressed Deep Neural Networks
2022 ICASSP 2022 Stealthy Backdoor Attack with Adversarial Training
2022 ICASSP 2022 When Does Backdoor Attack Succeed in Image Reconstruction? A Study of Heuristics vs. Bi-Level Solution
2022 ICLR 2022 How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
2022 IJCAI 2022 Data-Efficient Backdoor Attacks :octocat:
2022 IJCAI 2022 Imperceptible Backdoor Attack: From Input Space to Feature Representation :octocat:
2022 MM 2022 BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label
2022 NeurIPS 2022 Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class
2022 NeurIPS 2022 Handcrafted Backdoors in Deep Neural Networks
2022 NeurIPS 2022 Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch :octocat:
2022 TDSC 2022 One-to-N & N-to-One: Two Advanced Backdoor Attacks Against Deep Learning Models
2022 TIFS 2022 Dispersed Pixel Perturbation-Based Imperceptible Backdoor Trigger for Image Classifier Models
2022 TIFS 2022 Stealthy Backdoors as Compression Artifacts :octocat:
2022 TIP 2022 Poison Ink: Robust and Invisible Backdoor Attack
2021 AAAI 2021 Backdoor Decomposable Monotone Circuits and Propagation Complete Encodings
2021 AAAI 2021 Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification :octocat:
2021 CVPR 2021 Backdoor Attacks Against Deep Learning Systems in the Physical World
2021 ICCV 2021 CLEAR: Clean-up Sample-Targeted Backdoor in Neural Networks
2021 ICCV 2021 Invisible Backdoor Attack with Sample-Specific Triggers :octocat:
2021 ICCV 2021 LIRA: Learnable, Imperceptible and Robust Backdoor Attacks :octocat:
2021 ICCV 2021 Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective :octocat:
2021 ICLR 2021 WaNet - Imperceptible Warping-based Backdoor Attack :octocat:
2021 ICML 2021 Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks :octocat:
2021 IJCAI 2021 Backdoor DNFs
2021 NeurIPS 2021 Backdoor Attack with Imperceptible Input and Latent Modification
2021 NeurIPS 2021 Excess Capacity and Backdoor Poisoning :octocat:
2021 TDSC 2021 Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization
2021 USS 2021 Blind Backdoors in Deep Learning Models :octocat:
2020 AAAI 2020 Hidden Trigger Backdoor Attacks :octocat:
2020 CCS 2020 Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
2020 CIKM 2020 Can Adversarial Weight Perturbations Inject Neural Backdoors :octocat:
2020 CVPR 2020 Clean-Label Backdoor Attacks on Video Recognition Models :octocat:
2020 ECCV 2020 Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks :octocat:
2020 KDD 2020 An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks :octocat:
2020 MM 2020 GangSweep: Sweep out Neural Backdoors by GAN
2020 NeurIPS 2020 Input-Aware Dynamic Backdoor Attack :octocat:
2020 NeurIPS 2020 On the Trade-off between Adversarial and Backdoor Robustness :octocat:
2019 CCS 2019 Latent Backdoor Attacks on Deep Neural Networks :octocat:
2018 NDSS 2018 Trojaning Attack on Neural Networks :octocat:

Semi-supervised learning

Year Publication Paper Code
2023 ICCV 2023 The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning
2023 ICML 2023 Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning
2021 AAAI 2021 DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation
2021 TIFS 2021 Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures

Self-supervised learning

Year Publication Paper Code
2023 ICCV 2023 An Embarrassingly Simple Backdoor Attack on Self-supervised Learning :octocat:
2022 CVPR2022 Backdoor Attacks on Self-Supervised Learning :octocat:

Federated learning

Year Publication Paper Code
2023 NeurIPS 2023 IBA: Towards Irreversible Backdoor Attacks in Federated Learning
2023 NeurIPS 2023 A3FL: Adversarially Adaptive Backdoor Attacks to Federated Learning
2023 SIGIR 2023 Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures
2022 ICML2022 Neurotoxin: Durable Backdoors in Federated Learning
2020 AISTATS 2020 How To Backdoor Federated Learning :octocat:
2020 ICLR 2020 DBA: Distributed Backdoor Attacks against Federated Learning :octocat:
2020 NeurIPS 2020 Attack of the Tails: Yes, You Really Can Backdoor Federated Learning :octocat:
2022 USS 2022 FLAME: Taming Backdoors in Federated Learning

Reinforcement Learning

Year Publication Paper Code
2021 IJCAI 2021 BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning

Other CV tasks (Object detection, segmentation, point cloud)

Year Publication Paper Code
2023 NeurIPS 2023 BadTrack: A Poison-Only Backdoor Attack on Visual Object Tracking
2022 ICLR 2022 Few-Shot Backdoor Attacks on Visual Object Tracking :octocat:
2022 MM 2022 Backdoor Attacks on Crowd Counting :octocat:
2021 ICCV 2021 A Backdoor Attack against 3D Point Cloud Classifiers :octocat:
2021 ICCV 2021 PointBA: Towards Backdoor Attacks in 3D Point Cloud :octocat:

Multimodal models (Visual and Language)

Year Publication Paper Code
2024 IEEE SP Backdooring Multimodal Learning
2022 CVPR2022 Dual-Key Multimodal Backdoors for Visual Question Answering :octocat:
2022 ICASSP 2022 Object-Oriented Backdoor Attack Against Image Captioning
2022 ICLR 2022 Poisoning and Backdooring Contrastive Learning

Diffusion model

Year Publication Paper Code
2023 NeurIPS 2023 VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models
2023 ICCV 2023 Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis :octocat:
2023 CVPR 2023 How to Backdoor Diffusion Models? :octocat:
2023 CVPR 2023 TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets :octocat:

Large language model & other NLP tasks

Year Publication Paper Code
2023 NeurIPS 2023 TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models
2023 ICML 2023 Poisoning Language Models During Instruction Tuning :octocat:
2023 ICLR 2023 TrojText: Test-time Invisible Textual Trojan Insertion :octocat:
2023 NDSS 2023 BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
2022 ICLR 2022 BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models :octocat:
2022 IJCAI 2022 PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
2022 MM 2022 Opportunistic Backdoor Attacks: Exploring Human-imperceptible Vulnerabilities on Speech Recognition Systems
2022 NeurIPS 2022 BadPrompt: Backdoor Attacks on Continuous Prompts :octocat:
2022 USS 2022 Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation
2021 ACL 2021 Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger :octocat:
2021 ACL 2021 Rethinking Stealthiness of Backdoor Attack against NLP Models :octocat:
2021 ACL 2021 Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution :octocat:
2021 CCS 2021 Backdoor Pre-trained Models Can Transfer to All :octocat:
2021 CCS 2021 Hidden Backdoors in Human-Centric Language Models :octocat:
2021 EMNLP 2021 Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
2021 EMNLP 2021 Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer :octocat:
2021 EUROSP 2021 Trojaning Language Models for Fun and Profit :octocat:
2021 ICASSP 2021 Backdoor Attack Against Speaker Verification :octocat:

Graph Neural Networks

Year Publication Paper Code
2022 CCS 2022 Clean-label Backdoor Attack on Graph Neural Networks
2022 ICMR 2022 Camouflaged Poisoning Attack on Graph Neural Networks :octocat:
2022 RAID 2022 Transferable Graph Backdoor Attack
2021 SACMAT 2021 Backdoor Attacks to Graph Neural Networks :octocat:
2021 USS 2021 Graph Backdoor :octocat:
2021 WiseML 2021 Explainability-based Backdoor Attacks Against Graph Neural Network :octocat:

Theoretical analysis

Year Publication Paper Code
2020 NeurIPS 2020 On the Trade-off between Adversarial and Backdoor Robustness :octocat:

🛡Backdoor Defenses

Defense for supervised learning (Image classification)

Before-training (Preprocessing) stage

Year Publication Paper Code
2023 ICCV 2023 Beating Backdoor Attack at Its Own Game :octocat:
2023 USENIX Security 2023 Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
2023 USENIX Security 2023 ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms :octocat:
2023 USENIX Security 2023 How to Sift Out a Clean Data Subset in the Presence of Data Poisoning? :octocat:
2023 ICLR 2023 Towards Robustness Certification Against Universal Perturbations :octocat:
2021 ICML 2021 SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics :octocat:
2021 USENIX Security 2021 Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection :octocat:
2020 ICLR 2020 Robust anomaly detection and backdoor attack detection via differential privacy
2019 IEEE SP Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks :octocat:
2018 Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
2018 NeurIPS 2018 Spectral Signatures in Backdoor Attacks :octocat:

In-training stage

Year Publication Paper Code
2023 CVPR 2023 Backdoor Defense via Adaptively Splitting Poisoned Dataset :octocat:
2023 CVPR 2023 Backdoor Defense via Deconfounded Representation Learning :octocat:
2023 IEEE SP RAB: Provable Robustness Against Backdoor Attacks :octocat:
2023 ICLR 2023 Towards Robustness Certification Against Universal Perturbations :octocat:
2022 ICLR 2022 Backdoor defense via decoupling the training process :octocat:
2022 NeurIPS 2022 Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples :octocat:
2022 AAAI 2022 Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks
2021 NeurIPS 2021 Anti-Backdoor Learning: Training Clean Models on Poisoned Data :octocat:
2021 AAAI 2021 Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks :octocat:
2022 NeurIPS 2022 BagFlip: A Certified Defense against Data Poisoning :octocat:

Post-training stage

Year Publication Paper Category Code
2024 IEEE SP MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic Detection
2023 NeurIPS 2023 Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features Mitigation
2023 NeurIPS 2023 Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples Mitigation
2023 NeurIPS 2023 Stable Backdoor Purification with Feature Shift Tuning Mitigation :octocat:
2023 NeurIPS 2023 CBD: A Certified Backdoor Detector Based on Local Dominant Probability
2023 ICCV 2023 Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization Mitigation
2023 CVPR 2023 Backdoor Cleansing with Unlabeled Data Mitigation :octocat:
2023 CVPR 2023 MEDIC: Remove Model Backdoors via Importance Driven Cloning Mitigation
2023 CVPR 2023 Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks Mitigation
2023 CVPR 2023 Single Image Backdoor Inversion via Robust Smoothed Classifiers Inversion :octocat:
2023 ICLR 2023 UNICORN: A Unified Backdoor Trigger Inversion Framework Inversion :octocat:
2023 ICLR 2023 Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks Mitigation
2023 ICASSP 2023 Backdoor Defense via Suppressing Model Shortcuts Mitigation :octocat:
2022 ICLR 2022 Adversarial Unlearning of Backdoors via Implicit Hypergradient Mitigation :octocat:
2022 ICLR 2022 Trigger Hunting with a Topological Prior for Trojan Detection Detection [:octocat:](Trigger Hunting with a Topological Prior for Trojan Detection)
2022 ICLR 2022 AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis Detection :octocat:
2022 ICLR 2022 Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios Detection :octocat:
2022 NeruIPS 2022 Pre-activation Distributions Expose Backdoor Neurons Mitigation :octocat:
2022 NeruIPS 2022 One-shot Neural Backdoor Erasing via Adversarial Weight Masking Mitigation :octocat:
2022 NeruIPS 2022 Randomized channel shuffling: Minimal-overhead backdoor attack detection without clean datasets Detection :octocat:
2022 MM 2022 Purifier: Plug-and-play Backdoor Mitigation for Pre-trained Models Via Anomaly Activation Suppression Mitigation :octocat:
2022 IJCAI 2022 Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation Mitigation
2022 ECCV 2022 Data-free backdoor removal based on channel lipschitzness Mitigation :octocat:
2022 CVPR 2022 Few-Shot Backdoor Defense Using Shapley Estimation Mitigation
2022 CVPR 2022 Better Trigger Inversion Optimization in Backdoor Scanning Inversion :octocat:
2022 CVPR 2022 Complex Backdoor Detection by Symmetric Feature Differencing Detection :octocat:
2022 INFOCOM 2022 Backdoor Defense with Machine Unlearning Mitigation :octocat:
2022 TNNLS 2022 Critical Path-Based Backdoor Detection for Deep Neural Networks Detection
2021 IEEE SP Detecting AI Trojans Using Meta Neural Analysis Detection :octocat:
2021 NeurIPS 2021 Adversarial Neuron Pruning Purifies Backdoored Deep Models Mitigation :octocat:
2021 NeurIPS 2021 Topological Detection of Trojaned Neural Networks Detection :octocat:
2021 ICLR 2021 Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks Mitigation :octocat:
2021 TIFS Odyssey: Creation, Analysis and Detection of Trojan Models Detection :octocat:
2020 ICLR 2020 Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness Mitigation :octocat:
2020 MM 2020 Gangsweep: Sweep out neural backdoors by gan Mitigation :octocat:
2020 ICDM 2020 Towards Inspecting and Eliminating Trojan Backdoors in Deep Neural Networks Mitigation
2019 NeurIPS 2019 Defending Neural Backdoors via Generative Distribution Modeling Mitigation :octocat:
2019 CCS 2019 Abs: Scanning neural networks for backdoors by artificial brain stimulation Detection :octocat:
2019 IEEE SP Neural cleanse: Identifying and mitigating backdoor attacks in neural networks Mitigation :octocat:
2019 IJCAI 2019 DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks Detection+Mitigation
2018 RAID 2018 Fine-pruning: Defending against backdooring attacks on deep neural networks Mitigation :octocat:

Inference stage

Year Publication Paper Code
2024 IEEE S&P 2024 Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics :octocat:
2023 arXiv VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency
2023 NeurIPS 2023 Black-box Backdoor Defense via Zero-shot Image Purification
2023 NeurIPS 2023 A Unified Framework for Inference-Stage Backdoor Defenses
2023 ICLR 2023 SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency :octocat:
2023 CVPR 2023 Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
2023 CVPR 2023 Don’t FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs
2023 NDSS 2023 The “Beatrix” Resurrections: Robust Backdoor Detection via Gram Matrices :octocat:
2023 IJCAI 2023 Orion: Online Backdoor Sample Detection via Evolution Deviance
2021 ICCV 2021 Rethinking the backdoor attacks’ triggers: A frequency perspective :octocat:
2020 IEEE SP SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
2019 ACSAC 2019 STRIP: A Defence Against Trojan Attacks on Deep Neural Networks :octocat:

Defense for semi-supervised learning

Year Publication Paper Code

Defense for self-supervised learning

Year Publication Paper Code
2023 CVPR 2023 Detecting Backdoors in Pre-trained Encoders :octocat:
2023 CVPR 2023 Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning :octocat:

Defense for reinforcement learning

Year Publication Paper Code
2023 NeurIPS 2023 BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning
2023 ICCV 2023 PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning

Defense for federated learning

Year Publication Paper Code
2023 NeurIPS 2023 Theoretically Modeling Client Data Divergence for Federated Natural Language Backdoor Defense
2023 NeurIPS 2023 FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning
2023 NeurIPS 2023 Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training
2023 ICCV 2023 Multi-Metrics Adaptively Identifies Backdoors in Federated Learning
2023 ICLR 2023 FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning :octocat:

Defense for other CV tasks (Object detection, segmentation)

Year Publication Paper Code
2023 NeurIPS 2023 Django: Detecting Trojans in Object Detection Models via Gaussian Focus Calibration

Defense for multimodal models (Visual and Language)

Year Publication Paper Code
2023 NeurIPS 2023 Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks
2023 ICCV 2023 CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning :octocat:
2023 ICCV 2023 TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models :octocat:
2023 CVPR 2023 Detecting Backdoors in Pre-trained Encoders :octocat:

Defense for Large Language model & other NLP tasks

Year Publication Paper Code
2023 NeurIPS 2023 Setting the Trap: Capturing and Defeating Backdoor Threats in PLMs through Honeypots
2023 NeurIPS 2023 Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
2023 NeurIPS 2023 Theoretically Modeling Client Data Divergence for Federated Natural Language Backdoor Defense
2023 ACL 2023 Defending against Insertion-based Textual Backdoor Attacks via Attribution
2023 ACL 2023 Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias

Defense for diffusion models

Year Publication Paper Code

Defense for Graph Neural Networks

Year Publication Paper Code

Backdoor for social good

Watermarking

Year Publication Paper Code
2022 IJCAI2022 Membership Inference via Backdooring :octocat:
2022 NeurIPS 2022 Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection :octocat:
2018 USS 2018 Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring :octocat:

Explainable AI

Year Publication Paper Code
2021 KDD 2021 What Do You See?: Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors

⚙Benchmark and toolboxes

Name Publication Paper Code
BackdoorBench NeurIPS 2022 BackdoorBench: A Comprehensive Benchmark of Backdoor Learning :octocat:
OpenBackdoor NeurIPS 2022 A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks :octocat:
TrojanZoo EuroS&P 2022 TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors :octocat:
BackdoorBox BackdoorBox: An Open-sourced Python Toolbox for Backdoor Attacks and Defenses :octocat:
BackdoorToolbox :octocat: