https://github.com/dmarx/video-killed-the-radio-star
use grad cam on output tokens to infer per-token timestamps
see also: https://github.com/dmarx/bench-warmers/blob/main/subtitles-to-storyboard.md
https://github.com/dmarx/video-killed-the-radio-star
use grad cam on output tokens to infer per-token timestamps
see also: https://github.com/dmarx/bench-warmers/blob/main/subtitles-to-storyboard.md