Am I loading my trained models correctly? #9

luxiant · 2023-03-05T17:53:35Z

luxiant
Mar 5, 2023

I'm facing another problem while testing my fine tuned model. I trained bert tokenizer and transformer model with python, and saved the model with these files.

tokenizer : I saved this model in python via 'tokenizer.save_pretrained()' function.
special_tokens_map.json
spiece.model
tokenizer_config.json
vocab.txt
bert transformer : I saved following via 'model.save_pretrained()' function
bert_config.json
pytorch_model.bin
pytorch weight : I saved following via 'torch.save()'
sentiment_model.pt

and torch jit sentiment_model_jit.pt by tracing with following python code.

example = '오늘 비트에다 롱친 흑우새끼 없제?'
example = text_preprocess(example)
data = [example, '0']
data_list = [data]
data_set = BERTDataset(data_list, 0, 1, tok, vocab, max_len, True, False)
example_dataloader = torch.utils.data.DataLoader(data_set, batch_size = batch_size, num_workers = 5)
model.eval()

trace_list = []
for batch_id, (token_ids, valid_length, segment_ids, label) in enumerate(example_dataloader):
    token_ids = token_ids.long().to(device)
    segment_ids = segment_ids.long().to(device)
    valid_length= valid_length
    label = label.long().to(device)
    attention_mask = model.gen_attention_mask(token_ids, valid_length)

    trace_list.append(token_ids)
    trace_list.append(attention_mask)

traced_model = torch.jit.trace(bertmodel, trace_list, strict = False)
traced_model.save(PATH + 'sentiment_model_jit.pt')

In my go code, I tried to load my trained models by following code.

func loadModel() (*customModel, error) {
	defer fmt.Println("Successfully loaded model")
	return &customModel{
		tokenizer: loadTokenizer(),
		bertModel: loadBertModel(),
	}, nil
}

func loadTokenizer() *tokenizer.Tokenizer {
	currDir, err := os.Getwd()
	if err != nil {
		log.Fatal(err)
	}
	util.CdToThis()
	defer util.CdBack(currDir)
	model, err := wordpiece.NewWordPieceFromFile("model/tokenizer/vocab.txt", "[UNK]") // load tokenizer on the basis of vocab.txt
	if err != nil {
		log.Fatal(err)
	}
	tk := tokenizer.NewTokenizer(model)
	tk.WithNormalizer(normalizer.NewBertNormalizer(true, true, true, true))
	tk.WithPreTokenizer(pretokenizer.NewBertPreTokenizer())
	tk.WithDecoder(decoder.DefaultWordpieceDecoder())

	var specialTokens []tokenizer.AddedToken
	specialTokens = append(specialTokens, tokenizer.NewAddedToken("[MASK]", true))
	tk.AddSpecialTokens(specialTokens)

	sepId, ok := tk.TokenToId("[SEP]")
	if !ok {
		log.Fatalf("Cannot find ID for [SEP] token.\n")
	}
	sep := processor.PostToken{Id: sepId, Value: "[SEP]"}
	clsId, ok := tk.TokenToId("[CLS]")
	if !ok {
		log.Fatalf("Cannot find ID for [CLS] token.\n")
	}
	cls := processor.PostToken{Id: clsId, Value: "[CLS]"}
	tk.WithPostProcessor(processor.NewBertProcessing(sep, cls))
	return tk
}

func loadBertModel() *bert.BertForSequenceClassification {
	vs := nn.NewVarStore(*device)
	errVarstore := vs.Load("model/sentiment_model_jit.pt") // load model weight
	if errVarstore != nil {
		log.Fatalf("cannot load weights to varstore: %s", errVarstore)
	}
	bertConfig, _ := bert.ConfigFromFile("model/bert_config.json") // load bert config file
	var dummyLabelMap map[int64]string = make(map[int64]string)
	dummyLabelMap[0] = "long"
	dummyLabelMap[1] = "neutral"
	dummyLabelMap[2] = "short"
	bertConfig.Id2Label = dummyLabelMap
	bertConfig.OutputAttentions = true
	bertConfig.OutputHiddenStates = true
	return bert.NewBertForSequenceClassification(vs.Root().Sub("bert"), bertConfig, false)
}

and tried to forward the sentence with this code

...
func (m *customModel) processSentenceIntoInput(sentence string) *ts.Tensor {
        ...
	...
        return ts.MustStack(
		[]ts.Tensor{*ts.TensorFrom(tokInput)},
		0,
	).MustUnsqueeze(0, true)
}

func (m *customModel) bertSentimentProcess(dataframe dataframe.DataFrame) sentimentRow {
	var logit []float64
	ts.NoGrad(func() {
		input := m.processSentenceIntoInput(dataframe.Col("text").Records()[0])
		torchResult, _, _ := m.bertModel.ForwardT(
			input,
			ts.None,
			ts.None,
			ts.None,
			ts.None,
			false,
		)
		output := torchResult.MustSoftmax(-1, gotch.Double, true)
		logit = output.Float64Values(true)
		input.MustDrop()
	})
	long := &logit[0]
	neutral := &logit[1]
	short := &logit[2]
        ...
        ...

I tested this code in a sample of 100 text (I can't scale this up further since the memory leaking is still happening... have to work on later...) , and hit by an unexpected result. The model does not effectively sorting the text into three labels. Every time I run the code, all of the texts classified into single label, does not show consistent probability, neither consistent labels (it sometimes classified all sample texts into long, sometimes into neutral, sometimes into short, for example). I'm thinking that I need to load additional configuration, but cannot get any hint forward.

Answered by sugarme

Mar 8, 2023

Hi @luxiant ,

The error means there was either an issue mapping named tensors from your model and your weights or the gotch/pickle can not handle some pickled weights.

I often print out Python and Go variable names to compare.

import torch
from kobert.pytorch_kobert import get_pytorch_kobert_model

model, vocab  = get_pytorch_kobert_model()
for key, value in model.state_dict().items():
    print(f"{key} - {value.shape}")

Also, you can print out your Python pickled weight names and shapes from Go by using gotch/pickle (if it can handle it) as following:

err := pickle.LoadInfo(modelFile)
if err != nil {
    log.Fatal(err)
}

device := gotch.CPU
vs := nn.NewVarStore(device)
path := vs.Root()
m…

View full answer

sugarme · 2023-03-05T20:27:55Z

sugarme
Mar 5, 2023
Maintainer

@luxiant

The model should be created before you can load pretrained weights . As having BERT model constructed, you have more options to load pretrained weights.

Load JIT model: see this example of loading JIT model here
Load normal (pickled) Python model weights directly with gotch/pickle subpackage. Something like

import("github.com/sugarme/gotch/pickle")

device := gotch.CPU // or device := gotch.CudaIfAvailable() if using CUDA
vs := nn.NewVarStore(device)
err = pickle.LoadAll(vs, modelFile)
if err != nil {
  log.Fatalf("Load model weight error: \n%v", err)
}

Convert pytorch model weights and save to numpy array. Then you can load weights to gotch model.

Hope that helps.

1 reply

luxiant Mar 6, 2023
Author

Can I get a more detail about option 2? In python, I saved the pkl with this code from sentiment_model.pt in google colab environment. I tuned from pretrained korean BERT model called KoBERT.

!pip install transformers==4.8.2
!pip3 install torch --extra-index-url https://download.pytorch.org/whl/cu116
!pip install git+https://git@github.com/SKTBrain/KoBERT.git@master

import os
import re
import torch
import pickle
import numpy as np
import pandas as pd
from transformers import BertModel
from kobert.pytorch_kobert import get_pytorch_kobert_model
from google.colab import drive
drive.mount('/content/drive')

PATH = '/content/drive/MyDrive/analytics/market_sentiment_index/model/'
model = torch.load(PATH + 'sentiment_model.pt', map_location=torch.device('cpu'))

// copy into dictionary
model_dict = {}
for key, value in model.state_dict().items():
    model_dict[key] = value.cpu().numpy()

// save into pkl
with open(PATH + 'sentiment_model.pkl', 'wb') as f:
    pickle.dump(model_dict, f)

and then I changed this function in my go code like this. I've guessed that loading weights to varstore should be occurred after a new bert model is declared, so I changed the order and then tried to load weight from pkl that I got from the python code above.

func loadBertModel() *bert.BertForSequenceClassification {
	vs := nn.NewVarStore(*device)
	var mymodel *bert.BertForSequenceClassification

        // load config
	bertConfig, _ := bert.ConfigFromFile("model/bert_config.json") // load bert config file
	var dummyLabelMap map[int64]string = make(map[int64]string)
	dummyLabelMap[0] = "long"
	dummyLabelMap[1] = "neutral"
	dummyLabelMap[2] = "short"
	bertConfig.Id2Label = dummyLabelMap
	bertConfig.OutputAttentions = true
	bertConfig.OutputHiddenStates = true

	mymodel = bert.NewBertForSequenceClassification(vs.Root().Sub("bert"), bertConfig, false)

        // load model weight
	errWeight := pickle.LoadAll(vs, "model/sentiment_model.pkl")
	if errWeight != nil {
		log.Fatalf("cannot load weights to varstore: %s", errWeight)
	}

	return mymodel
}

but this shows an error

cannot load weights to varstore: LoadAll() failed: Decode() failed: Unpickler.Load() failed: loadReduce() failed: REDUCE requires a Callable object: &pickle.GenericClass{Module:"numpy.core.multiarray", Name:"_reconstruct"}
exit status 1

What am I missing?

sugarme · 2023-03-08T11:24:14Z

sugarme
Mar 8, 2023
Maintainer

Hi @luxiant ,

The error means there was either an issue mapping named tensors from your model and your weights or the gotch/pickle can not handle some pickled weights.

I often print out Python and Go variable names to compare.

import torch
from kobert.pytorch_kobert import get_pytorch_kobert_model

model, vocab  = get_pytorch_kobert_model()
for key, value in model.state_dict().items():
    print(f"{key} - {value.shape}")

Also, you can print out your Python pickled weight names and shapes from Go by using gotch/pickle (if it can handle it) as following:

err := pickle.LoadInfo(modelFile)
if err != nil {
    log.Fatal(err)
}

device := gotch.CPU
vs := nn.NewVarStore(device)
path := vs.Root()
model = bert.NewBertForSequenceClassification(path, config, false)

// print out model infor
vs.Summary()

If Python weights (names and shapes) contain Go weights (names and shapes), then you are safe to load weights partially:

numMissingVariables, err := vs.LoadPartial(modelFile)
if err != nil{
    panic(err)
}

I can't find my time to get an older Pytorch version as required by korbert to work in my box but if you can share your mockup model weight (just initializing your Python model and save weights) then I might be able to give you a detail code/suggestion how to load pretrained model in Go.

1 reply

luxiant Mar 8, 2023
Author

I finally solved. I'm leaving my code for record.

I failed in first two options, and finally found the cause while working with the third option.

The problem was that while saving my trained model, there was a problem saving weights of classifier layer. I did this mistake since the 'KoBERT' model itself requires their own libraries and function and this had made me confused.

Here is how I set my model in python.

class BERTClassifier(nn.Module):
    def __init__(self, bert, hidden_size = 768, num_classes = 3, dr_rate = None, params = None):
        super(BERTClassifier, self).__init__()
        self.bert = bert
        self.dr_rate = dr_rate
        self.classifier = nn.Linear(hidden_size, num_classes)
        if dr_rate:
            self.dropout = nn.Dropout(p = dr_rate)
    
    def gen_attention_mask(self, token_ids, valid_length):
        attention_mask = torch.zeros_like(token_ids)
        for i, v in enumerate(valid_length):
            attention_mask[i][:v] = 1
        return attention_mask.float()

    def forward(self, token_ids, valid_length, segment_ids):
        attention_mask = self.gen_attention_mask(token_ids, valid_length)
        _, pooler = self.bert(input_ids = token_ids, token_type_ids = segment_ids.long(), attention_mask = attention_mask.float().to(token_ids.device), return_dict = False)
        if self.dr_rate:
            pooler = self.dropout(pooler)
        out = self.classifier(pooler)
        return out

model = BERTClassifier(bertmodel, dr_rate=0.5).to(device)

and save with this code

# save bert config
bertmodel.save_pretrained(PATH)

# save tokenizer
tokenizer.save_pretrained(PATH + 'tokenizer/')

# save tokenizer vocabulary
vocab = tokenizer.get_vocab()
with open(PATH + 'tokenizer/vocab.txt', 'w', encoding='utf-8') as f:
    for token in vocab:
        f.write(token + '\n')

# save as npz
model_dict = {}
for key, value in model.state_dict().items():
    key = key.replace("gamma", "weight").replace("beta", "bias")
    model_dict[key] = np.ascontiguousarray(value.cpu().numpy())
np.savez(PATH + 'sentiment_model.npz', **model_dict)

and then I convert npz file into gt file with the code as you provided, and then run my code again.

Now this problem is solved, and I can get consistent, expected result from the sample file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Am I loading my trained models correctly? #9

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Am I loading my trained models correctly? #9

luxiant Mar 5, 2023

Replies: 2 comments · 2 replies

sugarme Mar 5, 2023 Maintainer

luxiant Mar 6, 2023 Author

sugarme Mar 8, 2023 Maintainer

luxiant Mar 8, 2023 Author

luxiant
Mar 5, 2023

Replies: 2 comments 2 replies

sugarme
Mar 5, 2023
Maintainer

luxiant Mar 6, 2023
Author

sugarme
Mar 8, 2023
Maintainer

luxiant Mar 8, 2023
Author