Skip to content

Train generic vision models w/ support for robust training + experimentation, i.e. AMP + DDP + configs. Just standard boilerplate amassed from different codebases.

License

Notifications You must be signed in to change notification settings

tyleryzhu/generic-vision-trainer

Repository files navigation

GenViT: Generic Vision Trainer

This repository is a generic, extensible, and robust vision model trainer. It is designed to be a good starting point for any vision project, and to be easily extensible to new models and datasets, and tasks, while also offering robust support for multi-gpu training and automatic mixed-precision training. It's an amalgamation of all the nice parts that I like about the repos I've worked with over the years.

While there exist many repositories which are lighter-weight and hackable, I usually desire something more robust for experimentation which offers reproducibility while still being reasonably hackable. As a result, I've chosen to adopt a yacs-based configuration system, which works reasonably well for most of my image recognition related tasks.

I generally subscribe to the philosophy that everything should be written in the config, where having a config and the same version of code is all that is needed to reproduce a result. However, I'm not nearly as strict about there only being one way to doing this, as using command line args for hacking is very useful. Another point that I follow is a factoring of the codebase into a few key components which are separated, making it easier to hack separately on. Every component is built in its corresponding build.py file, and configs should be entirely handled within that file so that the components themselves can be used in isolation or in other applications.

The optimizer is also step-based rather than epoch-based, which is a bit more flexible for my use cases.

Quickstart

To run an example, try running the train_resnet18.sh script. This will train a resnet18 on the CIFAR10 dataset.

Todos:

  • Multiple slurm submit options (PySlurm and submitit).
  • Proper wandb support with loggers (D2 like?).

Nice repository references and inspirations

Tiny, hackable codebases:

  • Karpathy's minGPT
  • Bring Your Own Latent (BYOL)
  • Kaiming's Masked Autoencoders (absolutely superb for submitit, helpful for slurm).
  • Karpathy's nanoGPT (slightly more structure but enough to train large models)
  • CleanRL, nice for RL... obviously

Medium sized repositories with good abstractions (actual folder breakdowns):

  • Berkeley's nerfstudio (for graphics however).
  • Microsoft Swin Transformer. Most of the code is taken from here because I enjoyed my time working with this repoository.

Larger codebases that are decent:

  • Facebook's PySlowFast, but too bloated to hack with (much like detectron2).

About

Train generic vision models w/ support for robust training + experimentation, i.e. AMP + DDP + configs. Just standard boilerplate amassed from different codebases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published