A paper list of federated learning - About system design. Currently, it mainly focuses on distributed computing framework, communication & computation efficiency for Federated Learning (Cross-Silo & Cross-devices).
Chinese blogs: Neth-Lab, includes study notes, tutorials and development documents.
Last update: Apr, 12th, 2022.
For some paper, we have provided some Chinese blogs.
- 2.1 Survey
- 2.2 Optimization in algorithm perspective
- 2.3 Optimization in framework perspective
- 2.4 Optimization in communication perspective
- 2.5 Optimization for Memory
- 2.6 Optimization for Homomorphic Encryption
- Understand the types of federated learning. Sep 2020
- A brief introduction to the terminology and classification of federal learning
- Federated Machine Learning: Concept and Applications. TITS. Qiang Yang. 2019
- Chinese blog: Overview of Federated Learning. Section 1
-
Practical Secure Aggregation for Privacy-Preserving Machine Learning. 2017. CCS
- Horizontal logistic regression algorithm
-
Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. 2017. arXiv
- Vertical logistic regression algorithm.
- Chinese blog: Machine Learning & Federated Learning. Section 5
-
SecureBoost: A Lossless Federated Learning Framework. 2021. IEEE Intelligent Systems
- Vertical secure boosting algorithm.
- Chinese blog: Machine Learning & Federated Learning. Section 6
-
A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions. 2021. ACM Cmputing Surveys
-
A Quantitative Survey of Communication Optimizations in Distributed Deep Learning. 2021. IEEE Network
-
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection. 2021. TKDE
- About system challenges for Fedrated Learning
- Chinese blog: Survey of System Design for Distributed ML & FL. Section 2.2
-
System Optimization in Synchronous Federated Training: A Survey. 2021. arXiv
- Focus on time-to-accuracy optimization for FL system
- Chinese blog: Survey of System Design for Distributed ML & FL. Section 2.3
-
A Survey on Distributed Machine Learning. 2020. ACM Computing Surveys
- About system challenges for distributed machine learning
- Chinese blog: Survey of System Design for Distributed ML & FL. Section 2.1
-
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey. 2020. arXiv
-
SAFELearn: Secure Aggregation for private FEderated Learning. 2021. S&P workshop
-
Secure bilevel asynchronous vertical federated learning with backward updating. 2021. AAAI
-
VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning. 2021. SIGMOD
- System optimization for SBT using overlapping between encryption, cross-Internet communication and histogram constraction. In addition, it gives several trick implementations in practice.
- Chinese blog: Summary of VF2Boost
-
FedML: A Research Library and Benchmark for Federated Machine Learning. 2020. arXiv
- A library and system architecture for FL.
-
FDML: A Collaborative Machine Learning Framework for Distributed Features. 2019. KDD
This section will collect paper from both Distributed framework for other computation (e.g. DNN and batch-based computation) and Distributed System for FL
-
Sphinx: Enabling Privacy-Preserving Online Learning over the Cloud. 2022. S&P
-
Cerebro: A Platform for Multi-Party Cryptographic Collaborative Learning. 2021. USENIX Security
-
Citadel: Protecting Data Privacy and Model Confidentiality for Collaborative Learning. 2021. SoCC
-
SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning. 2021. arXiv
-
FEDAT: A COMMUNICATION-EFFICIENT FEDERATED LEARNING METHOD WITH ASYNCHRONOUS TIERS UNDER NON-IID DATA. 2020. arXiv
-
Throughput-Optimal Topology Design for Cross-Silo Federated Learning. 2020. arXiv
-
Towards Federated Learning at Scale: System Design. 2019. MLSys
- A framework for scaling horizontal FL.
- Chinese blogs: Survey of Distributed Framework in Federated Learning. Section 3
-
Oort: Efficient Federated Learning via Guided Participant Selection. 2021. OSDI
-
TiFL: A tier-based federated learning system. 2020. HPDC
Since currently there is a few research paper about distributed framework for FL, here we provide related work focus on Machine Learning Framework for reference.
-
Gradient Compression Supercharged High-Performance Data Parallel DNN Training. 2021. SOSP
- A noval pipeline method (GPU-based) of compression operations and computing operations for compression distributed DNN training.
- Chinese blog: Summary of HiPress
-
DAPPLE: a pipelined data parallel approach for training large models. 2021. PPoPP
-
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections. 2021. OSDI
-
P3: Distributed Deep Graph Learning at Scale. 2021. OSDI
-
PipeDream: generalized pipeline parallelism for DNN training. 2019. SOSP
-
Ray: A Distributed Framework for Emerging AI Applications. 2018. OSDI
- Actor-based framework & parallelism methods for reinforce learning.
- Chinese blog: Summary of Ray
-
Tensorflow: A system for large-scale machine learning. 2016. OSDI
- Chinese blog: Summary of TensorFlow
-
Spark sql: Relational data processing in spark. 2015. SIGMOD
-
Scaling distributed machine learning with the parameter server. 2014. OSDI
- Chinese blog: Summary of Parameter Server
-
Large scale distributed deep networks. 2012. NeurIPS
-
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. 2012. OSDI
- Chinese blog: Summary of Apache Spark
-
MapReduce: simplified data processing on large clusters. 2004. OSDI
- Chinese blog: Summary of MapReduce
-
CrystalPerf: Learning to Characterize the Performance of Dataflow Computation through Code Analysis. 2021. ATC
-
Scaling Large Production Clusters with Partitioned Synchronization. 2021. ATC
- A distributed resource scheduler architecture. Use partition synchronization method to reduce the impact of contention on high-quality resources and staleness of local states, which causes high scheduling latency.
- Chinese blog: Survey of Framework-based Optimization for Federated Learning. Section 4
-
Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications. 2021. SOSP
-
Advanced synchronization techniques for task-based runtime systems. 2021. PPoPP
-
Ownership: A Distributed Futures System for Fine-Grained Tasks. 2021. NSDI
-
FLASHE: Additively Symmetric Homomorphic Encryption for Cross-Silo Federated Learning. 2021. arXiv
-
Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost. 2021. arXiv
-
Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference. 2021. HPCA
-
Communication-Efficient Federated Learning with Adaptive Parameter Freezing. 2021. ICDCS
-
RC-SSFL: Towards Robust and Communication-efficient Semi-supervised Federated Learning System. 2020. arXiv
-
BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. 2020. ATC
- Use quantization method to compress encrypted data size, which reduces the costs of communication and computation.
- Chinese blog: Summary of BatchCrypt
-
FetchSGD: Communication-Efficient Federated Learning with Sketching. 2020. ICML
- Use sketching method to reduce communication costs (compress gradients), which just need once communication between server and clients.
- Chinese blog: Summary of Sketching. section 3
-
CMFL: Mitigating Communication Overhead for Federated Learning. 2019. ICDCS
- Reduce communication costs by reducing times of communication between edge devices and center server. Similar as Gaia, it introduces relevance between local updates and global updates to determine whether transfer the local updates to center server.
- Chinese blog: Summary of CMFL
This section will introduce some researches focus on tradition Machine Learning, which is related to Federated Learning.
-
- A INT8 quantization model, which is used in tiny on-device learning
- Chinese blog: Survey of Communication-based Optimization for Federated Learning. Section 4
-
Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems. 2021. SIGCOMM
- Introduce collective communication to task-based runtime distributed frameworks (e.g., Ray, Dask, Hydro)
- Chinese blog: Summary of Hoplite
-
waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data. 2020. PPoPP
-
Communication-efficient distributed sgd with sketching. 2019. NIPS
- Use sketch method to choose top-k gradient elements so that workers just need transfer top-k updates, which reduces communication cost.
- Chinese blog: Summary of Sketching. Section 2
-
A generic communication scheduler for distributed DNN training acceleration. 2019. SOSP.
-
Sketchml: Accelerating distributed machine learning with data sketches. 2018. SIGMOD
-
Gradient Sparsification for Communication-Efficient Distributed Optimization. 2018. NIPS
-
Horovod: fast and easy distributed deep learning in TensorFlow. 2018. arXiv
- Chinese blog: Summary of Hoplite. Section 3
-
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. 2017. NIPS
-
Gaia: Geo-distributed machine learning approaching lan speeds. 2017. NSDI
- Use significant function to determine the importance of updates. If smaller than threshold, do not transfer so that mitigate the overhead of WAN bandwidth. Introduce a new parallelism method called ASP, which is proved can guarantee convergence requirement.
- Chinese blog: Summary of Gaia
-
GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language. 2021. NSDI
- A memory management system for interactive graph computation, at distributed infrastructure layer.
- Chineses blog: Survey of Framework-based Optimization for Federated Learning. Section 2
-
A novel memory-efficient deep learning training framework via error-bounded lossy compression. 2021. PPoPP
-
Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. 2021. ATC
-
Are dynamic memory managers on GPUs slow?: a survey and benchmarks. 2021. PPoPP
-
Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. 2021. HPCA
-
F1: A fast and programmable accelerator for fully homomorphic encryption. 2021. MICRO
-
EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation. 2020. PLDI
-
CHET: An Optimizing Compiler for Fully-Homomorphic Neural-Network Inferencing. 2019. PLDI
-
FATE: Industrial framework for FL. From WeBank. Chinese blog: Architecture of FATE
-
PyTorch Implementation: An implementation based on PyTorch. From shaoxiongji
-
Microsoft/SEAL: easy-to-use and powerful homomorphic encryption library.
-
Microsoft/EVA: Compiler for the SEAL homomorphic encryption library.