Konsti resnet implementation #105

KonstiNik · 2023-11-03T13:05:49Z

Implementation of a flax ResNet from HuggingFace.

Implement a pre-defined model. In that way, a pre-trained model can easily be fine-tuned.

- Make small changes in the JaxModel Class to allow to resnet implementation - write huggingface Flax implementation - test the NTK calculation Todo: - test for models beyond resnets - update example script

- run black - fix imports

SamTov · 2023-11-03T14:48:16Z

Thanks for the PR! In this case, I would have said the Black part should have been done down the chain in a separate PR. It makes it very difficult to review the larger changes to the code as there are now 109 files that need looking into. Can you highlight which modules you have changed in the ResNetPR? Alternatively, make a new PR to main where you only do the black formatting and then merge that one here.

SamTov

I think this is one of the biggest (importance wise) PRs that ZnNL has seen for a long time, so awesome work. I have a couple of points scattered throughout the code but also two things I want to raise here:

Can we add one or two training pases to the test?
Did you not have to update the training procedure? The time I got this working, I needed to take into consideration the batch statistics and all these other things being passed correctly. I don't see these changes here, what was the solution?

examples/ResNet-Example.ipynb

CI/unit_tests/models/test_huggingface_flax_model.py

znnl/models/jax_model.py

KonstiNik · 2023-11-06T10:06:08Z

Did you not have to update the training procedure? The time I got this working, I needed to take into consideration the batch statistics and all these other things being passed correctly. I don't see these changes here, what was the solution?

No, I did not have to. The HF call method is directly compatible with constructing a Trainstate. After constructing it the rest is straight forward. Where exactly did you run into issues?

SamTov · 2023-11-06T10:48:34Z

Did you not have to update the training procedure? The time I got this working, I needed to take into consideration the batch statistics and all these other things being passed correctly. I don't see these changes here, what was the solution?

No, I did not have to. The HF call method is directly compatible with constructing a Trainstate. After constructing it the rest is straight forward. Where exactly did you run into issues?

The call to the network has a new return signature? It should return the batch stats along with the logits and these batch stats have to be propagated to the network in the forward passes and during updates. We deal with this in the NTK calculation but unless it snuck through the last time I worked on it, there won't be batch stats passed

KonstiNik · 2023-11-06T12:18:13Z

Did you not have to update the training procedure? The time I got this working, I needed to take into consideration the batch statistics and all these other things being passed correctly. I don't see these changes here, what was the solution?

No, I did not have to. The HF call method is directly compatible with constructing a Trainstate. After constructing it the rest is straight forward. Where exactly did you run into issues?

The call to the network has a new return signature? It should return the batch stats along with the logits and these batch stats have to be propagated to the network in the forward passes and during updates. We deal with this in the NTK calculation but unless it snuck through the last time I worked on it, there won't be batch stats passed

Batch_stats are included in our model_state.params. But good point, I have to check whether the batch_stats get handled properly.

SamTov · 2023-11-06T12:53:29Z

Did you not have to update the training procedure? The time I got this working, I needed to take into consideration the batch statistics and all these other things being passed correctly. I don't see these changes here, what was the solution?

No, I did not have to. The HF call method is directly compatible with constructing a Trainstate. After constructing it the rest is straight forward. Where exactly did you run into issues?

The call to the network has a new return signature? It should return the batch stats along with the logits and these batch stats have to be propagated to the network in the forward passes and during updates. We deal with this in the NTK calculation but unless it snuck through the last time I worked on it, there won't be batch stats passed

Batch_stats are included in our model_state.params. But good point, I have to check whether the batch_stats get handled properly.

Here for example, each time they call the model, they collect this other part of the output tuple:

logits, new_model_state = state.apply_fn(
        {'params': params, 'batch_stats': state.batch_stats},
        batch['image'],
        mutable=['batch_stats'],

It isn't the batch stats sorry it is this model state part. This has to be passed to other functions in order for the model to train properly. From my memory, it had something to do with ensuring the batch stats are used and updated correctly. We don't do this in normal training we just ignore this additional output.

Now in their weight update, they do the following:

 new_state = state.apply_gradients(
      grads=grads, batch_stats=new_model_state['batch_stats']
  )

so they need these stats.

They also seem to always pass it explicitly in model forward passes:

variables = {'params': state.params, 'batch_stats': state.batch_stats}
 logits = state.apply_fn(variables, batch['image'], train=False, mutable=False)

This is in the eval step so nothing involved in training. This may not be necessary as the initial object is a dict anyway, but we do need to be sure.

- extend the TrainState to capture for batch statistics - a train_apply method to the jax model to distinguish between evaluating and training a model - adapt the nt, flax and hfflax model to the changes. - rewrite the train step to account for batch statistics

KonstiNik · 2023-11-09T17:53:21Z

Further Changes need to include:

Adapt Training Strategies to the updated TrainState
Jit larger training functions
Make ResNet example more descriptive

- Fix the parameter handling for other strategies.

SamTov · 2023-11-10T12:23:13Z

Further Changes need to include:

Adapt Training Strategies to the updated TrainState

Jit larger training functions

Adapt trace opt to updated TrainState

Make ResNet example more descriptive

You can remove traceopt from this, I can take care of it in my other traceopt PR. I have made enough changes to it there that this would just set the whole thing back.

- make example clearer - Create example for something that is not clear yet.

KonstiNik · 2023-11-13T14:08:37Z

There was an issue with HF FlaxResNets when using smaller models with layer_type='basic' instead of layer_type='bottleneck':
huggingface/transformers#27257
It is fixed and merged into the main branch of hf-transformers. So it might be released soon.

For my examples and tests to pass, I have therefore used layer_type='bottleneck'.

SamTov

Such a cool PR! I am excited to see it in action. I did have a few more comments that it would be great if we could discuss.

znnl/training_strategies/training_steps.py

znnl/training_strategies/training_decorator.py

znnl/models/jax_model.py

znnl/models/huggingface_flax_model.py

znnl/models/flax_model.py

CI/unit_tests/models/test_huggingface_flax_model.py

CI/integration_tests/models/test_huggingface_flax_model_deployment.py

SamTov

Noice

KonstiNik and others added 7 commits November 1, 2023 22:22

First draft of a hugging face resnet implementation

49b1395

Implement a flax ResNet from HuggingFace

1b5efc8

- Make small changes in the JaxModel Class to allow to resnet implementation - write huggingface Flax implementation - test the NTK calculation Todo: - test for models beyond resnets - update example script

- Include transformers in requirements

8d79342

- run black - fix imports

Run Black on all files

f759d3d

fix last black error

26a2728

Rename resnet example notebook

61ffe4f

uncomment jax in requirements.txt

7224e28

KonstiNik requested a review from SamTov November 3, 2023 13:05

knikolaou added 2 commits November 3, 2023 16:26

Remove unnecessary imports in jax_model

893f1d2

Merge branch 'main' into Konsti_resnet_implementation

bb05736

SamTov requested changes Nov 3, 2023

View reviewed changes

examples/ResNet-Example.ipynb Show resolved Hide resolved

CI/unit_tests/models/test_huggingface_flax_model.py Outdated Show resolved Hide resolved

CI/unit_tests/models/test_huggingface_flax_model.py Show resolved Hide resolved

znnl/models/jax_model.py Show resolved Hide resolved

- Make the train state a class and adapt the mdoel handling

613536b

- Fix the parameter handling for other strategies.

knikolaou added 4 commits November 10, 2023 17:04

Adapt tests of huggingface flax model

b6e5841

Adapt ResNet Example notebook

4d546f5

- make example clearer - Create example for something that is not clear yet.

run black

c4b7b25

Change layer_type from basic to bottleneck.

6b90303

knikolaou added 5 commits November 13, 2023 15:12

change name of integration test file

649cc4a

uncomment training procedure in example notebook.

365e691

Fix error handling and layer_type of the example

3e72dd9

Comment out plot in example script

5bc2111

fit apply function

826b41c

run isort

ba19e54

SamTov requested changes Nov 14, 2023

View reviewed changes

knikolaou added 6 commits November 14, 2023 11:23

Add docstrings to the training steps class

0fd55b7

Write unit test for train step module

9b8021e

Enhance Value Error for no model available

6f75030

Fix input shape ValueError Message

a87d2e6

Remove commented out code

339a436

Remove cpu setting in tests

e4b04f5

SamTov approved these changes Nov 17, 2023

View reviewed changes

KonstiNik merged commit ff88aca into main Nov 17, 2023
6 checks passed

KonstiNik deleted the Konsti_resnet_implementation branch November 17, 2023 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Konsti resnet implementation #105

Konsti resnet implementation #105

KonstiNik commented Nov 3, 2023 •

edited

Loading

SamTov commented Nov 3, 2023

SamTov left a comment

KonstiNik commented Nov 6, 2023

SamTov commented Nov 6, 2023

KonstiNik commented Nov 6, 2023 •

edited

Loading

SamTov commented Nov 6, 2023 •

edited

Loading

KonstiNik commented Nov 9, 2023 •

edited

Loading

SamTov commented Nov 10, 2023 •

edited by KonstiNik

Loading

KonstiNik commented Nov 13, 2023

SamTov left a comment

SamTov left a comment

Konsti resnet implementation #105

Konsti resnet implementation #105

Conversation

KonstiNik commented Nov 3, 2023 • edited Loading

SamTov commented Nov 3, 2023

SamTov left a comment

Choose a reason for hiding this comment

KonstiNik commented Nov 6, 2023

SamTov commented Nov 6, 2023

KonstiNik commented Nov 6, 2023 • edited Loading

SamTov commented Nov 6, 2023 • edited Loading

KonstiNik commented Nov 9, 2023 • edited Loading

SamTov commented Nov 10, 2023 • edited by KonstiNik Loading

KonstiNik commented Nov 13, 2023

SamTov left a comment

Choose a reason for hiding this comment

SamTov left a comment

Choose a reason for hiding this comment

KonstiNik commented Nov 3, 2023 •

edited

Loading

KonstiNik commented Nov 6, 2023 •

edited

Loading

SamTov commented Nov 6, 2023 •

edited

Loading

KonstiNik commented Nov 9, 2023 •

edited

Loading

SamTov commented Nov 10, 2023 •

edited by KonstiNik

Loading