- Currently all of these models would have been converted from openelm-coreml.py
- Review ops, layers, precision for a model
- Review apple/ml-recurrent-drafter
- modeling_llama.py
- this model also seems to be an ANE optimized Llama with the ANE Principles being implemented
- lines 161 and 162 deal with key_states and value_states
- Class LlamaAttention
- key_states = torch.repeat_interleave(key_states, dim=1, repeats=self.n_kv_groups)
- value_states = torch.repeat_interleave(value_states, dim=1, repeats=self.n_kv_groups)
- Review chunk_mlprogram.py (changed from apple/ml-stable-diffusion)
- Optimize for chunking text LLMs
- needs to check PSNR
- random_gen_input_feature_type func is not working due to the model being converted, not properly displaying a value type to let the func know how to generate those input features (this seems to be the issue)
- program does work
- The differences: how they get info, how they display it, and environment packages
- smpanaro/CoreMLInspect
- this would work basically all around in any env
- layer-iteration.py
- this requires something similar to ml-explore/mlx-examples env
- due to missing PIL package, I had issues using my python venv
- OpenELM-270M-Instruct
- OpenELM-1B-Instruct (may not come, have to determine if RAM or Storage Issue)