Capollama is a command-line tool that generates image captions using Ollama's vision models. It can process single images or entire directories, optionally saving the captions as text files alongside the images.
- Process single images or recursively scan directories
- Support for JPG, JPEG, and PNG formats
- Customizable caption prompts
- Optional prefix and suffix for captions
- Automatic caption file generation with dry-run option
- Configurable vision model selection
- Skips hidden directories (starting with '.')
- Skip existing captions by default with force option available
- Ollama installed and running as server
- A vision-capable model pulled (like
llava
orllama3.2-vision
)
Install from Release Page
go install github.com/oderwat/capollama@latest
Basic usage:
capollama path/to/image.jpg
Process a directory:
capollama path/to/images/directory
Usage: capollama [--dry-run] [--start START] [--end END] [--prompt PROMPT] [--model MODEL] [--force] PATH
Positional arguments:
PATH Path to an image or a directory with images
Options:
--dry-run, -n Don't write captions as .txt (stripping the original extension)
--start START, -s START
Start the caption with this (image of Leela the dog,)
--end END, -e END End the caption with this (in the style of 'something')
--prompt PROMPT, -p PROMPT
The prompt to use [default: Please describe the content and style of this image in detail. Answer only with one sentence that is starting with "A ..."]
--model MODEL, -m MODEL
The model that will be used (must be a vision model like "llava") [default: x/llama3.2-vision]
--force, -f Also process the image if a file with .txt extension exists
--help, -h display this help and exit
--version display version and exit
Generate a caption for a single image (will save as .txt):
capollama image.jpg
Process all images in a directory without writing files (dry run):
capollama --dry-run path/to/images/
Force regeneration of all captions, even if they exist:
capollama --force path/to/images/
Use a custom prompt and model:
capollama --prompt "Describe this image briefly" --model llava image.jpg
Add prefix and suffix to captions:
capollama --start "A photo showing" --end "in vintage style" image.jpg
By default:
- Captions are printed to stdout in the format:
path/to/image.jpg: A detailed caption generated by the model
- Caption files are automatically created alongside images:
path/to/image.jpg path/to/image.txt
- Existing caption files are skipped unless
--force
is used - Use
--dry-run
to prevent writing caption files
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This tool uses: