A project to showcase various AI vision capabilites.
Image captioning based on your webcam
- Live
This is a real-time scene description. The main challenge here, aside from the engineering, is the UX. How do you transcribe a scene in real-time and provide a decent UX? Open question...
- Snapshot
This describes the scene at a single point in time
Image Generation
Through prompting i.e. "Generate an image of..."
Based off your webcam