Skip to content

🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup

License

Notifications You must be signed in to change notification settings

logicalroot/gpt-4v-demos

Repository files navigation

GPT-4V Demos

Python 3.8+ Streamlit App Open in GitHub Codespaces

This mobile-friendly web app provides some basic demos to test the vision capabilities of GPT-4V.

Streamlit was selected as a framework for this project to enable rapid prototyping of new ideas.

Examples

📷 CameraTake a photo with your device's camera and generate a caption.
Turkey An unexpected traveler struts confidently across the asphalt, its iridescent feathers gleaming in the sunlight. This wild turkey, with its distinctive fan of tail feathers on full display, appears unfazed by the nearby human presence. A dash of wilderness encounters suburbia as the bird navigates between nature and civilization. The scene is a gentle reminder of the ever-present connection between human and animal territories.
👕 Product DescriptionsGenerate a product description for an image.
Pre-generated Hoodie Additional Input
{ "product_attributes": { "brand_name": "Logical Root", "product_name": "Pre-generated Hoodie", "materials": "100% digital cotton" } }
Output
{ "description": "Embrace the fusion of art and comfort with the Logical Root Pre-generated Hoodie, a masterpiece crafted from 100% digital cotton to provide unparalleled softness and durability. The hoodie comes in a classic, versatile shade of black, boasting a bold graphic print at its core that captures a whirlwind of vibrant colors in an abstract design, promising to turn heads and spark conversations. With a spacious front pocket to keep your essentials close and a snug hoodie with adjustable drawstrings for those extra chilly days, this piece is the epitome of functional fashion. The ribbed cuffs and hem ensure a perfect fit while adding to the overall sleek silhouette, making it a must-have addition to your wardrobe whether you're aiming for a casual day out or a statement-making ensemble." }
🧾 OCRExtract the text from an image.
Soup Can The text on the can reads:
- Campbell's®
- CONDENSED
- 90 CALORIES PER 1/2 CUP
- Tomato
- SOUP
- NET WT. 10 3/4 OZ. (305g)
There is also text within a gold seal that reads:
- "PARIS INTERNATIONAL EXPOSITION 1900"
📋 Quality ControlGenerate a QC report for an image.
Strawberries Additional Input
"issue_critical": true if inedible, "issue_category": string, "issue_description": single-paragraph string
Output
{ "issues": [ { "issue_critical": true, "issue_category": "Contamination", "issue_description": "There is visible mold on one of the strawberries in the bottom left corner of the image, indicating spoilage and potential health risk if consumed." }, { "issue_critical": false, "issue_category": "Physical Damage", "issue_description": "Several strawberries appear to have minor physical damage, such as dents and bruises, which may affect their shelf life and aesthetic appeal but are not necessarily a health hazard." } ] }
🗣️ SpeechGenerate audio from an image using GPT-4V + OpenAI TTS.
Alice Download audio | Play audio on CodePen

Prerequisites

  • Python 3.8+
  • OpenAI API key

How can I access GPT-4?

Local setup

Here's how you can get started.

  1. Clone this repository.
git clone https://github.com/logicalroot/gpt-4v-demos.git
cd gpt-4v-demos
  1. Install the necessary packages:
pip install streamlit
  1. Run the application:
streamlit run 🏠_Home.py
  1. To remove the missing secrets warning, create a blank secrets.toml file in your .streamlit folder.

Tip

To avoid inputting your OpenAI API key every run, you can add it to secrets.toml with the following line. Paste your key between the double quotes.

OPENAI_API_KEY = "YOUR KEY"

For safety, ensure secrets.toml is in your .gitignore file.

Limitations

To use the camera input on iOS devices, Streamlit must be configured to use SSL. See Streamlit docs.

Contributing

Feel free to experiment and share new demos using the code!

About GPT-4V

License

This project is licensed under the terms of the MIT license.

About

🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup

Topics

Resources

License

Stars

Watchers

Forks

Languages