Skip to content

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

License

Notifications You must be signed in to change notification settings

zhangyikaii/LAMDA-ZhiJian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 

Generic badge GitHub Workflow Status (branch) Read the Docs
PyPI PyPI - Downloads
PyTorch - Version Python - Version

A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

[Paper] [Code] [Docs]

English | 中文

ZhiJian (执简驭繁) is a comprehensive and user-friendly PyTorch-based Model Reuse toolbox for leveraging foundation pre-trained models and their fine-tuned counterparts to extract knowledge and expedite learning in real-world tasks.

The rapid progress in deep learning has led to the emergence of numerous open-source Pre-Trained Models (PTMs) on platforms like PyTorch, TensorFlow, and HuggingFace Transformers. Leveraging these PTMs for specific tasks empowers them to handle objectives effectively, creating valuable resources for the machine-learning community. Reusing PTMs is vital in enhancing target models' capabilities and efficiency, achieved through adapting the architecture, customizing learning on target data, or devising optimized inference strategies to leverage PTM knowledge.

overview

🔥 To facilitate a holistic consideration of various model reuse strategies, ZhiJian categorizes model reuse methods into three sequential modules: Architect, Tuner, and Merger, aligning with the stages of model preparation, model learning, and model inference on the target task, respectively. The provided interface methods include:

Architect Module [Click to Expand]

The Architect module involves modifying the pre-trained model to fit the target task, and reusing certain parts of the pre-trained model while introducing new learnable parameters with specialized structures.

    Linear Probing & Partial-k, How transferable are features in deep neural networks? In: NeurIPS'14. [Paper] [Code]
WSFG
    Adapter, Parameter-Efficient Transfer Learning for NLP. In: ICML'19. [Paper] [Code]
WSFG
    Diff Pruning, Parameter-Efficient Transfer Learning with Diff Pruning. In: ACL'21. [Paper] [Code]
WSFG
    LoRA, LoRA: Low-Rank Adaptation of Large Language Models. In: ICLR'22. [Paper] [Code]
WSFG
    Visual Prompt Tuning / Prefix, Visual Prompt Tuning. In: ECCV'22. [Paper] [Code]
WSFG
    Scaling & Shifting, Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning. In: NeurIPS'22. [Paper] [Code]
WSFG
    AdaptFormer, AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In: NeurIPS'22. [Paper] [Code]
WSFG
    BitFit, BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In: ACL'22. [Paper] [Code]
WSFG
    Convpass, Convolutional Bypasses Are Better Vision Transformer Adapters. In: Tech Report 07-2022. [Paper] [Code]
WSFG
    Fact-Tuning, FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer. In: AAAI'23. [Paper] [Code]
WSFG
Tuner Module [Click to Expand]

The Tuner module focuses on training the target model with guidance from pre-trained model knowledge to expedite the optimization process, e.g., via adjusting objectives, optimizers, or regularizers.

    Knowledge Transfer, NeC4.5: neural ensemble based C4.5. In: IEEE Trans. Knowl. Data Eng. 2004. [Paper] [Code]
WSFG
    FitNet, FitNets: Hints for Thin Deep Nets. In: ICLR'15. [Paper] [Code]
WSFG
    LwF, Learning without Forgetting. In: CVPR'19. [Paper] [Code]
WSFG
    FSP, A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: CVPR'17. [Paper] [Code]
WSFG
    NST, Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. In: CVPR'17. [Paper] [Code]
WSFG
    RKD, Relational Knowledge Distillation. In: CVPR'19. [Paper] [Code]
WSFG
    SPKD, Similarity-Preserving Knowledge Distillation. In: CVPR'19. [Paper] [Code]
WSFG
    CRD, Contrastive Representation Distillation. In: ICLR'20. [Paper] [Code]
WSFG
    REFILLED, Distilling Cross-Task Knowledge via Relationship Matching. In: CVPR'20. [Paper] [Code]
WSFG
    WiSE-FT, Robust fine-tuning of zero-shot models. In: CVPR'22. [Paper] [Code]
WSFG
    L2 penalty / L2-SP, Explicit Inductive Bias for Transfer Learning with Convolutional Networks. In: ICML'18. [Paper] [Code]
WSFG
    Spectral Norm, Spectral Normalization for Generative Adversarial Networks. In: ICLR'18. [Paper] [Code]
WSFG
    BSS, Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning. In: NeurIPS'19. [Paper] [Code]
WSFG
    DELTA, DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks. In: ICLR'19. [Paper] [Code]
WSFG
    DeiT, Training data-efficient image transformers & distillation through attention. In: ICML'21. [Paper] [Code]
WSFG
    DIST, Knowledge Distillation from A Stronger Teacher. In: NeurIPS'22. [Paper] [Code]
WSFG
Merger Module [Click to Expand]

The Merger module influences the inference phase by either reusing pre-trained features or incorporating adapted logits from the pre-trained model.

    Nearest Class Mean, Generalizing to new classes at near-zero cost. In: TPAMI'13. [Paper] [Code]
WSFG
    SimpleShot, SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. In: CVPR'19. [Paper] [Code]
WSFG
    Head2Toe, Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning. In: ICML'22. [Paper] [Code]
WSFG
    VQT, Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning. In: CVPR'23. [Paper] [Code]
WSFG
    via Optimal Transport, Model Fusion via Optimal Transport. In: NeurIPS'20. [Paper] [Code]
WSFG
    Model Soup Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: ICML'22. [Paper] [Code]
WSFG
    Fisher Merging Merging Models with Fisher-Weighted Averaging. In: NeurIPS'22. [Paper] [Code]
WSFG
    Deep Model Reassembly Deep Model Reassembly. In: NeurIPS'22. [Paper] [Code]
WSFG
    REPAIR REPAIR: REnormalizing Permuted Activations for Interpolation Repair. In: ICLR'23. [Paper] [Code]
WSFG
    Git Re-Basin Git Re-Basin: Merging Models modulo Permutation Symmetries. In: ICLR'23. [Paper] [Code]
WSFG
    ZipIt ZipIt! Merging Models from Different Tasks without Training. In: ICLR'23. [Paper] [Code]
WSFG

💡 ZhiJian also has the following highlights:

  • Support reuse of various pre-trained model zoo, including:
  • Extremely easy to get started and customize
    • Get started with a 10 minute blitz Open In Colab
    • Customize datasets and pre-trained models with step-by-step instructions Open In Colab
    • Feel free to create a novel approach for reusing pre-trained model Open In Colab
  • Concise things do big
    • Only ~5000 lines of the base code, with incorporating method like building LEGO blocks
    • State-of-the-art results on VTAB of Multi-Reuse Tasks Challenge with approximately 10k experiments [here]
    • Support friendly guideline and comprehensive documentation to custom dataset and pre-trained model [here]

"ZhiJian" in Chinese means handling complexity with concise and efficient methods. Given the variations in pre-trained models and the deployment overhead of full parameter fine-tuning, ZhiJian represents a solution that is easily reusable, maintains high accuracy, and maximizes the potential of pre-trained models.

“执简驭繁”的意思是用简洁高效的方法驾驭纷繁复杂的事物。“繁”表示现有预训练模型和复用方法种类多、差异大、部署难,所以取名"执简"的意思是通过该工具包,能轻松地驾驭模型复用方法,易上手、快复用、稳精度,最大限度地唤醒预训练模型的知识。

 

🕹️ Quick Start

  1. An environment with Python 3.7+ from conda, venv, or virtualenv.

  2. Install ZhiJian using pip:

    $ pip install zhijian
    • [Option] Install with the newest version through GitHub:
      $ pip install git+https://github.com/ZhangYikaii/LAMDA-ZhiJian.git@main --upgrade
  3. Open your python console and type

    import zhijian
    print(zhijian.__version__)

    If no error occurs, you have successfully installed ZhiJian.

  4. Try a demo that reuses pre-trained ViT-B/16 on target CIFAR-100 dataset with LoRA

    from zhijian.trainers.base import get_args, prepare_trainer
    
    args = get_args(
        dataset='VTAB-1k.CIFAR-100',  # dataset
        dataset_dir='your/dataset/directory',  # dataset directory
        model='timm.vit_base_patch16_224_in21k',  # backbone network
        config_blitz='(LoRA.adapt): ...->(blocks[0:12].attn.qkv){inout1}->...',  # addin blitz configuration
        training_mode='finetune',  # training mode
        optimizer='adam',  # optimizer
        lr=1e-2,  # learning rate
        wd=1e-5,  # weight decay
        gpu='0',  # gpu id
        verbose=True  # control the verbosity of the output
    )
    
    import torch, os
    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
    torch.cuda.set_device(int(args.gpu))
    
    # Pre-trained Model
    from zhijian.trainers.finetune import get_model
    model, model_args, device = get_model(args)
    
    # Target Dataset
    from zhijian.data.base import prepare_vision_dataloader
    train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
    
    # Optimizer
    import torch.optim as optim
    optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.wd)
    lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, args.max_epoch, eta_min=args.eta_min)
    criterion = torch.nn.CrossEntropyLoss()
    
    # Trainer
    trainer = prepare_trainer(
        args,
        model=model, model_args=model_args, device=device,
        train_loader=train_loader, val_loader=val_loader, num_classes=num_classes,
        optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion
    )
    
    trainer.fit()
    trainer.test()

    For more information, please click the tutorials.

 

Documentation

📚 The tutorials and API documentation are hosted on ZhiJian.readthedocs.io

 

Why ZhiJian?

architecture

Related Library GitHub Stars # of Alg.(1) # of Model(1) # of Dataset(1) # of Fields(2) LLM Supp. Docs. Last Update
PEFT GitHub stars 6 ~15 (3) 1(a) ✔️ ✔️ GitHub last commit
adapter-transformers GitHub stars 10 ~15 (3) 1(a) ✔️ GitHub last commit
LLaMA-Efficient-Tuning GitHub stars 4 5 ~20 1(a) ✔️ GitHub last commit
Knowledge-Distillation-Zoo GitHub stars 20 2 2 1(b) GitHub last commit
Easy Few-Shot Learning GitHub stars 10 3 2 1(b) GitHub last commit
Model soups GitHub stars 3 3 5 1(c) GitHub last commit
Git Re-Basin GitHub stars 3 5 4 1(c) GitHub last commit
ZhiJian 🙌 30+ ~50 19 3(a,b,c) ✔️ ✔️ GitHub last commit

(1): access date: 2023-08-05 (2): fields for (a) Architect; (b) Tuner; (c) Merger;

📦 Reproducible SoTA Results

ZhiJian fixed the random seed to ensure reproducibility of the results, with only minor variations across different devices.

VTAB of Multi-Reuse-Tasks Challenge

We develop a robust classification challenge called VTAB-M (Visual Task Adaptation Benchmark for Multi-Reuse-Tasks), building upon the VTAB. This challenge involves tackling a diverse set of 18 visual tasks concurrently, while harnessing the power of pre-trained knowledge. The primary objective is to equip models with versatile capabilities that span across natural, specialized, and structured visual domains.

The challenge incorporates datasets including CIFAR-100, CLEVR-Count, CLEVR-Distance, Caltech101, DTD, Diabetic-Retinopathy, Dmlab, EuroSAT, KITTI, Oxford-Flowers-102, Oxford-IIIT-Pet, PatchCamelyon, RESISC45, SVHN, dSprites-Location, dSprites-Orientation, smallNORB-Azimuth, and smallNORB-Elevation. Following the VTAB-1k standards, we sample a training set consisting of 1,000 samples from each dataset. The comprehensive model evaluation is conducted using the entire test data. VTAB-M serves as a comprehensive evaluation framework that assessing models' generalization and adaptation across diverse visual tasks. It pushes the pre-trained models to become more versatile and proficient through reuse methods.

More results will be released gradually in upcoming updates. Please stay tuned for more information.

Method Tuned Params Mixed Mean Caltech101 CIFAR-100 CLEVR-Count CLEVR-Distance Diabetic-Retinopathy Dmlab dSprites-Location dSprites-Orientation DTD EuroSAT KITTI Oxford-Flowers-102 Oxford-IIIT-Pet PatchCamelyon RESISC45 smallNORB-Azimuth smallNORB-Elevation SVHN
Adapter
0.73/86.53(M) 57.14 84.16 66.74 30.43 22.97 75.92 46.29 3.76 26.47 68.03 95.13 49.09 98.63 91.47 79.21 82.25 7.99 23.20 76.71
LoRA
0.71/86.51(M) 57.61 84.75 63.92 33.25 27.85 76.37 44.90 4.54 24.72 68.56 94.33 50.91 98.80 91.66 82.57 82.71 5.92 27.00 74.30
VPT / Deep
0.45/86.24(M) 53.12 83.15 52.39 23.49 20.67 75.13 39.37 2.84 23.06 66.12 93.13 42.33 97.82 90.00 77.45 79.75 7.65 18.02 63.87
Linear Probing
0.42/86.22(M) 48.59 80.93 37.15 14.07 22.27 74.68 35.32 3.29 18.51 60.69 88.72 40.08 97.59 88.09 79.36 72.98 7.42 15.09 38.34
Partial-1
7.51/86.22(M) 51.60 81.87 42.01 25.50 24.34 75.20 39.39 2.08 24.29 63.94 91.37 34.60 97.82 89.48 79.50 77.57 7.65 21.85 50.35

 

Contributing

ZhiJian is currently in active development, and we warmly welcome any contributions aimed at enhancing capabilities. Whether you have insights to share regarding pre-trained models, data, or innovative reuse methods, we eagerly invite you to join us in making ZhiJian even better. If you want to submit your valuable contributions, please click here.

 

Citing ZhiJian

@misc{zhang2023zhijian,
  title={ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse}, 
  author={Yi-Kai Zhang and Lu Ren and Chao Yi and Qi-Wei Wang and De-Chuan Zhan and Han-Jia Ye},
  year={2023},
  eprint={2308.09158},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

@misc{zhijian2023,
  author = {ZhiJian Contributors},
  title = {LAMDA-ZhiJian},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zhangyikaii/LAMDA-ZhiJian}}
}