Skip to content

CLI tool for creating image captions using Ollama vision models

License

Notifications You must be signed in to change notification settings

oderwat/capollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Capollama

Capollama is a command-line tool that generates image captions using Ollama's vision models. It can process single images or entire directories, optionally saving the captions as text files alongside the images.

Features

  • Process single images or recursively scan directories
  • Support for JPG, JPEG, and PNG formats
  • Customizable caption prompts
  • Optional prefix and suffix for captions
  • Automatic caption file generation with dry-run option
  • Configurable vision model selection
  • Skips hidden directories (starting with '.')
  • Skip existing captions by default with force option available

Prerequisites

  • Ollama installed and running as server
  • A vision-capable model pulled (like llava or llama3.2-vision)

Installation precompiled binary

Install from Release Page

Installation from source (needs Go >=1.22 installed)

go install github.com/oderwat/capollama@latest

Usage

Basic usage:

capollama path/to/image.jpg

Process a directory:

capollama path/to/images/directory

Command Line Arguments

Usage: capollama [--dry-run] [--start START] [--end END] [--prompt PROMPT] [--model MODEL] [--force] PATH

Positional arguments:
  PATH                   Path to an image or a directory with images

Options:
  --dry-run, -n          Don't write captions as .txt (stripping the original extension)
  --start START, -s START
                         Start the caption with this (image of Leela the dog,)
  --end END, -e END      End the caption with this (in the style of 'something')
  --prompt PROMPT, -p PROMPT
                         The prompt to use [default: Please describe the content and style of this image in detail. Answer only with one sentence that is starting with "A ..."]
  --model MODEL, -m MODEL
                         The model that will be used (must be a vision model like "llava") [default: x/llama3.2-vision]
  --force, -f            Also process the image if a file with .txt extension exists
  --help, -h             display this help and exit
  --version              display version and exit

Examples

Generate a caption for a single image (will save as .txt):

capollama image.jpg

Process all images in a directory without writing files (dry run):

capollama --dry-run path/to/images/

Force regeneration of all captions, even if they exist:

capollama --force path/to/images/

Use a custom prompt and model:

capollama --prompt "Describe this image briefly" --model llava image.jpg

Add prefix and suffix to captions:

capollama --start "A photo showing" --end "in vintage style" image.jpg

Output

By default:

  • Captions are printed to stdout in the format:
    path/to/image.jpg: A detailed caption generated by the model
    
  • Caption files are automatically created alongside images:
    path/to/image.jpg
    path/to/image.txt
    
  • Existing caption files are skipped unless --force is used
  • Use --dry-run to prevent writing caption files

License

MIT License

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgments

This tool uses:

  • Ollama for local LLM inference
  • go-arg for argument parsing