Skip to content

Latest commit

 

History

History
67 lines (45 loc) · 2.63 KB

torchinfo.md

File metadata and controls

67 lines (45 loc) · 2.63 KB

torchinfo recipes

We have to pass some dummy data in, presumably so it can trace through a forward pass, forcing PyTorch to materialise the model.

t5-large recipe:

import torch
import torchinfo
from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config

config = T5Config.from_pretrained('t5-large')
model = T5ForConditionalGeneration.from_pretrained('t5-large')

# this will be different for different model types:
input_data = (torch.ones(1, config.max_length, dtype=torch.int),) * 3

summary = torchinfo.summary(model, input_data=input_data, device="cpu")

(we learn the required shape of input_data from this comment: https://stackoverflow.com/questions/65140400/valueerror-you-have-to-specify-either-decoder-input-ids-or-decoder-inputs-embed#comment115193484_65140400)

You can repr(summary) to get a nice table, but it is also an object with methods.

Total params: 1,138,405,888
Params size (MB): 3082.27
Estimated Total Size (MB): 3194.34

...suggests that this is a 1.1B model (not 770M as I had read elsewhere).

This might be useful in some way https://github.com/Ki6an/fastT5 in future.

gpt2-xl recipe

from transformers import AutoModelForCausalLM, AutoConfig

config = AutoConfig.from_pretrained("gpt2-xl")
model = AutoModelForCausalLM.from_pretrained("gpt2-xl")

summary = torchinfo.summary(
    model,
    input_data=torch.ones(1, config.max_length, dtype=torch.int),
    device="cpu",
)

showing:

Total params: 1,638,022,400
Params size (MB): 6552.09
Estimated Total Size (MB): 6696.07

Presumably these are float32 weights currently, hence ~4x no. of params.

Anyway, looks like T5-large may be possible, since it's smaller than this.

I'm not sure how this fits with https://github.com/smpanaro/more-ane-transformers/blob/main/src/experiments/NOTES.md where they get gpt2-xl running on ANE, while also identifying a 3GB size limit for models to run on it. I don't quite follow what is the pipeline trick they talk about doing (seems to involve breaking the model into chunks?). It may be a 3Gi limit (3Gi is approximately 3.221 GB).

NOTE: CoreML Pipeline seems different to HuggingFace Pipeline

In CoreML the Pipeline is a sequence of models. In HuggingFace it seems a high-level abstraction for common tasks against a single model, simplifying preparing and passing the input and getting the desired output.

FWIW I think there's room for building the HF style abstraction on top of CoreML for running these models