-
change llama-ssp to handle ssp evals
- in measure_model_score, call to ssp rather than sample_model when draft given
- in llama_ssp, allow call with draft model
- in xp script, allow ssp evals
-
cleaner output: Nb of multiplications, nb of successes, ratio, confidence intervals
- Output confidence intervals
- Get valid evals -- with confidence intervals -- for 7 to 65B and 13 to 65B ssp
- split in 2: 7 & 13B, and 13B ssp on the xpg5 machine
- 30 & 65B / regular and ssp / on the xpfat machine
- Write the results in the readme
- Comm with the link to the results
- ability to make small runs that pass as integration tests
- change cli code to accept an eval size
- change xp.sh to have a --test arg
- readme with the new cli
- fabricmanager handled
- import base in sample model
- repeated sequential prompting
- 8bit issue with temperature sampling
- change evals to multiplications
- add script to run regular eval xps
- Additions benchmark; few shot prompt with additions from 1 to 300
- measure bench results on all models + ssp models
- More "releasable" version (same)
- clean readme recipe to reproduce everything
- (opt) Use meta code to do the same
-
make the ideal readme structure
-
correct the timings for Ssp improvements
-
explain 65B does not perform well
-
caveats
-
Experiments to get a good speedup on p4
- make the vid script and describe the view
-
Cleaner logs
- Clear completions & timings for each example
- use logging for debugging output
- change command line to get output & errors on files too
-
Run timings on all models on A100 machine, get all the speeds
-
Run SSP timings for various model configurations
-
Try sampling with temperature 1 & observing on additions
-
Capture in readme all the stuff