Skip to content

digital-nomad-cheng/Integrate_ncnn_with_TVM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In this branch, I try to explore how to use TVM's BYOC to offload computation to a highly optimized inference engine designed for mobile device - ncnn. The implementation is based on TVM v0.13.0.

Current Progress

  • Core codegen and runtime logic, relay pattern matching done. Successfully parse nn.dense layer from relay and get layer information in runtime. Dispatch the computation to ncnn.
  • Types of layers support progresss...
    • Merge nn.dense + nn.bias_add composites
    • reshape layer
    • Merge activation function with nn.dense, support nn.dense + bias_add + relu for now - 25/Sep/2023
    • nn.conv2d - 26/Sep/2023
    • Merge nn.conv2d + nn.bias_add + nn.relu - 26/Sep/2023
    • nn.depthwise_conv2d
    • ...
  • Reduce memory traffic by allocating input and output tensor at initializing engine time instead of per run. Increase the speed by 20% for AlexNet. 5/Oct
  • Set thread number based on hardware
  • Fallback to layout packing
  • Support dispath subgraph instead of per layer
  • Reduce memory traffic when copying weights and tensors from tvm to ncnn, perhaps using tvm as ncnn::Mat's allocator
  • Performance benchmark: For AlexNet, on raspberry pi 4B the performance of arm compute lib is 12.455 seconds for image size 227x227, 100 runs, while for ncnn is 8.536 seconds - 31.46% speedup. 6/Oct/2023

How to Use

  1. download repo with prepared Dockerfile: git clone --recursive https://github.com/digital-nomad-cheng/tvm/ && cd tvm
  2. build docker container: docker build . -t ncnn_codegen
  3. run docker container: docker run -it ncnn_codegen:latest
  4. test: cd ../tvm_project_course/byoc && python alexnet_ncnn_codegen.py
  5. ncnn can support x86 CPU. To benchmark with Arm Compute Lib, you need a ARM device for example Raspberry Pi.

Open Deep Learning Compiler Stack

Documentation | Contributors | Community | Release Notes

Build Status WinMacBuild

Apache TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.

License

TVM is licensed under the Apache-2.0 license.

Getting Started

Check out the TVM Documentation site for installation instructions, tutorials, examples, and more. The Getting Started with TVM tutorial is a great place to start.

Contribute to TVM

TVM adopts apache committer model, we aim to create an open source project that is maintained and owned by the community. Check out the Contributor Guide.

Acknowledgement

We learned a lot from the following projects when building TVM.

  • Halide: Part of TVM's TIR and arithmetic simplification module originates from Halide. We also learned and adapted some part of lowering pipeline from Halide.
  • Loopy: use of integer set analysis and its loop transformation primitives.
  • Theano: the design inspiration of symbolic scan operator for recurrence.