A simple flask application that runs a local copy of the AMD OLMo language model.
- run
python main.py
in the terminal then open http://127.0.0.1:5000 or localhost:5000 in your browser.
- The model is roughly 5GB and has to be downloaded the first time the application is run. Please account for your device storage and download speed as appropriate.
- A test run I did had the model load in 16 seconds and give a response in 1 minute 17 seconds to get a response on my laptop as I used my CPU only (an Intel Core i5, 8GB laptop). Model response times would vary based on your device specifications e.g RAM size, CPU speed etc.
A running sample is given below
The response is