A project to showcase various AI vision capabilites.
Included:
-
Image captioning based on your webcam
- Live
This is a real-time scene description. The main challenge here, aside from the engineering, is the UX. How do you transcribe a scene in real-time and provide a decent UX? Open question...
- Snapshot
This describes the scene at a single point in time
-
Image Generation
-
Through prompting i.e. "Generate an image of..."
-
Based off your webcam
-