llama-vision-chat.py
is a Python script leveraging the Llama 3.2-Vision model to analyze images and generate detailed descriptions. The script can output the analysis to a file or display it directly in the console.
- Utilizes the Llama 3.2-Vision model for image analysis.
- Utilizes Ollama to run the model locally.
- Outputs analysis to a specified file or prints it to the console.
- Provides comprehensive descriptions of image content, including any text detected.
- Python 3.x
- Ollama 0.4.1
-
Install the required Python package:
pip install ollama
-
Pull the Llama 3.2-Vision model:
ollama pull llama3.2-vision
To execute the script, use the following command:
python llama-vision-chat.py <image_path> [output_file]
<image_path>
: The path to the image file to be analyzed.[output_file]
(optional): The path to the output file where the response will be written. If not provided, the response will be printed to the console.
python llama-vision-chat.py food-plate.jpg food-plate-output.txt
This command analyzes the image food-plate.jpg
and writes the response to food-plate-output.txt
.
Sample outputs from trial runs for two images are available in the samples
folder.
Analyzes an image using the Llama 3.2-Vision model and writes the response to a file or prints it.
image_path
(str): The path to the image file to be analyzed.output_file
(str, optional): The path to the output file where the response will be written. If not provided, the response will be printed.
- None
The script parses command-line arguments and invokes the analyze_image
function.
python llama-vision-chat.py <image_path> [output_file]
image_path
(str): The path to the image file to be analyzed.output_file
(str, optional): The path to the output file where the response will be written. If not provided, the response will be printed.
This project is licensed under the MIT License. See the LICENSE file for details.
The base code was derived from a sample in Ollama's blog and subsequently enhanced using GitHub Copilot chat with several prompts utilizing GPT-4. Minor adjustments were made to improve and customize functionality.
This project is provided "as is" without any warranties or guarantees. The author assumes no responsibility or liability for:
- The accuracy, reliability, or completeness of any outputs generated by the code
- Any misuse or misinterpretation of the results
- Any issues arising from the use of third-party packages including but not limited to Llama and Ollama
- Any decisions or actions taken based on the image analysis results
This software is intended for educational and learning purposes only. Users should exercise their own judgment when interpreting results. The underlying AI models may produce incorrect, incomplete, or biased outputs.
All trademarks, logos, model names, and brand names mentioned are the property of their respective owners.