not perfect, but it works so far. ship fast, fix later. this is a learning exercise, after all.
inspirations / references: micrograd, tinygrad, pytorch, paperswithcode
features
- tensors w/ autodiff
- feed-forward and backpropagation
- loss functions (mse, crossentropy, etc.)
- optimizers (sgd, adam, etc.)
- extendable optimizer class
- modules (linear, sequential, etc.)
- extendable module class
- activations (relu, tanh, etc.)
- functional lib (like torch.nn.functional)
- persistent and temporary buffers
- saving state dicts
todo (not guaranteed)
- gpu support (via jax?)
- loading state dicts