With the advent of cloud computing, machine learning as a service (MLaaS) has become a growing phenomenon with the potential to address many real-world problems. In an untrusted cloud environment, privacy concerns of users is a major impediment to the adoption of MLaaS. To alleviate these privacy issues and preserve data confidentiality, several private inference (PI) protocols have been proposed in recent years based on cryptographic tools like Fully Homomorphic Encryption (FHE) and Secure Multiparty Computation (MPC). Deep neural networks (DNN) have been the architecture of choice in most MLaaS deployments. One of the core challenges in developing PI protocols for DNN inference is the substantial costs involved in implementing non-linear activation layers such as Rectified Linear Unit (ReLU). This has spawned research into the search for accurate, but efficient approximations of the ReLU function and neural architectures that operate on a stringent ReLU budget. While these methods improve efficiency and ensure data confidentiality, they often come at a significant cost to prediction accuracy. In this work, we propose a DNN architecture based on polynomial kervolution called \emph{PolyKervNet} (PKN), which completely eliminates the need for non-linear activation and max pooling layers. PolyKervNets are both FHE and MPC-friendly - they enable FHE-based encrypted inference without any approximations and improve the latency on MPC-based PI protocols without any use of garbled circuits. We demonstrate that it is possible to redesign standard convolutional neural networks (CNN) architectures such as ResNet-18 and VGG-16 with polynomial kervolution and achieve approximately
Check Paper: IEEE, OpenReview, ResearchGate
Update: A new version of PolyKervNets is at Cryptology, ArXiv
- Clone this repository.
- Create a conda environment, and install these main packages:
- To run, go through the
train.py
file and make desired changes based on the experiment you want to run. Note that this code is not production friendly, but we have made it quite easy to navigate.
Our major observation was how unstable PKNs are. We were unsuccessful in training PKN-50 (ResNet-50) (RPKNs was successful). PKNs are sensitive to factors such as Architecture size and complexity, Learning Rate, Polynomial degree (dp), Balance factor (cp), dataset, etc. One requires careful hyperparameter tuning to get it right.
Update: We were able to build a more stable version of PKNs. New future directions are listed below:
- Investigating the potential benefits of combining R-PKNs with gradient clipping to determine if this approach can yield comparable or superior results in terms of stability and overall performance.
- Exploring layer-wise learning rate initialization, where deeper layers are assigned different learning rates than initial layers, in order to further optimize the training process for polynomial-based networks. A quick experiment with this gave RPKR-50 an accuracy of 87.9% without requiring tuning or knowledge distillation.
- Exploring alternative optimization techniques, such as Quasi-Newton based approaches, to determine if certain types of optimizers exhibit superior performance and convergence properties when applied to polynomial-based networks.
- Extending the scope of our conclusions to assess whether they are applicable to other polynomial-based approaches, beyond RPKRs, in various deep learning scenarios.
- Evaluating the generalizability of our approach to different datasets and model architectures, such as Vision Transformers (ViTs), to determine its effectiveness in a broader context.