Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sergiosolorzano authored Sep 3, 2023
1 parent 0b4138c commit 6381f4b
Showing 1 changed file with 18 additions and 18 deletions.
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,33 +10,33 @@ The AI models in the Unity project of this repo are powered by Microsoft's cross

We run two of the AI models locally, [Whisper-Tiny](https://huggingface.co/openai/whisper-tiny) and [stable diffusion in U-Net architecture](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx); we access a third, [Chatgpt](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-csharp), remotely via an API.

<img width="600" alt="diagram-flow" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/e7e43d4b-1def-4324-8937-966ef7899f0c">
<img width="600" alt="diagram-flow" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/e7e43d4b-1def-4324-8937-966ef7899f0c">
<p>&nbsp;</p>

In a Unity scene we loop the AI Models over each podcast audio section to generate the contextual images.

<video src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/d8b81ca5-a478-449c-b102-24d75c8b2e45" controls="controls" muted="muted" playsinline="playsinline">
<video src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/d8b81ca5-a478-449c-b102-24d75c8b2e45" controls="controls" muted="muted" playsinline="playsinline">
</video>
<p>&nbsp;</p>

Finally, once the models have generated all images, we enhance these from 512×512 resolution to crisper 2048×2048 resolutions with the Real-ESRGAN AI Model. Suggested implementation steps in our [blog](https://tapgaze.com/blog/podcast-to-image-slider/#real-esrgan).

*512×512* <img width="246" alt="512×512_image" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/01e41039-8f9b-444a-bc57-a800d6db53c1"> *2048x2048* <img width="246" alt="512×512_image" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/9a76a919-2616-4bdd-b7b0-080d9d346847">
*512×512* <img width="246" alt="512×512_image" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/01e41039-8f9b-444a-bc57-a800d6db53c1"> *2048x2048* <img width="246" alt="512×512_image" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/9a76a919-2616-4bdd-b7b0-080d9d346847">
<p>&nbsp;</p>

## Proof of Concept: Chatomic - "<i>Chat</i> and Create a <i>Comic</i>"
I am thrilled and truly grateful to Maurizio Raffone at [Tech Shift F9 Podcast](https://linktr.ee/techshiftf9) for trusting me to run a proof of concept of the Chatomic app with the audio file of a fantastic episode in this podcast.
## Proof of Concept: Talkomic - "<i>Talk</i> and Create a <i>Comic</i>"
I am thrilled and truly grateful to Maurizio Raffone at [Tech Shift F9 Podcast](https://linktr.ee/techshiftf9) for trusting me to run a proof of concept of the Talkomic app with the audio file of a fantastic episode in this podcast.

+ Watch The [Trailer🎬](https://tapgaze.com/blog/podcast-to-image-slider/#podcast-trailer)
<video src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/80671f8f-7975-449d-ab5f-d30ab8a3cd77" controls="controls" playsinline="playsinline">
<video src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/80671f8f-7975-449d-ab5f-d30ab8a3cd77" controls="controls" playsinline="playsinline">
</video>
+ Watch The <i>Chatomicd</i> [Complete Podcast📽️](https://youtu.be/pWK4vFLD6_E)
+ View and download the [Podcast's AI Image Gallery🎨](https://tapgaze.com/blog/techshift-f9-chatomic-images/#podcast-gallery)
+ Watch The <i>Talkomicd</i> [Complete Podcast📽️](https://youtu.be/pWK4vFLD6_E)
+ View and download the [Podcast's AI Image Gallery🎨](https://tapgaze.com/blog/techshift-f9-talkomic-images/#podcast-gallery)
+ See the Podcasts' AI Images in Augmented Reality😎 with the [Tapgaze app](https://apps.apple.com/gb/app/tapgaze/id1534427791)
<p>&nbsp;</p>

## Project's Blog Post
Read [the Chatomic app blog](https://tapgaze.com/blog/podcast-to-image-slider/) for the suggested steps to build the project in Unity:
Read [the Talkomic app blog](https://tapgaze.com/blog/podcast-to-image-slider/) for the suggested steps to build the project in Unity:
+ Convert AI models to Onnx format using Olive the [whisper-tiny text-transcription AI model](https://tapgaze.com/blog/podcast-to-image-slider/#whisper-olive)
+ Processing [chunked podcast audio](https://tapgaze.com/blog/podcast-to-image-slider/#whisper-chunks) for whisper
+ [Chatgpt API request](https://tapgaze.com/blog/podcast-to-image-slider/#chatgpt)
Expand All @@ -49,29 +49,29 @@ Read [the Chatomic app blog](https://tapgaze.com/blog/podcast-to-image-slider/)

* Native dlls (Onnxruntime, NAudio etc) required files: Project should include the following packages to Visual Studio (tested in VS2022 v.17.7.3) and dlls to Unity's Assets/Plugins directory.

<img width="252" alt="native-dlls" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/1fd10f26-bf85-400f-b3f1-609b20ebadee">
<img width="252" alt="native-dlls" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/1fd10f26-bf85-400f-b3f1-609b20ebadee">

<img width="512" alt="native-dlls_vs2022" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/f6f20f8f-5337-466c-b337-f999147c2cf4">
<img width="512" alt="native-dlls_vs2022" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/f6f20f8f-5337-466c-b337-f999147c2cf4">

* Clone and save [weights.pb](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx/unet) weights file into Assets/Models/unet/ . Step also required for this repo's Release package (file too large).
Fail model download availability, try [here](https://drive.google.com/file/d/1NvYhoGyw_fuYx9n6KdzWc24n5q6VavOH/view).

* Podcast Audio Section List Required: Create in script ChatomicManager.cs at GenerateSummaryAndTimesAudioQueueAndDirectories() a list for each section in the podcast audio with the section_name and its start time in minutes:seconds.
* Podcast Audio Section List Required: Create in script TalkomicManager.cs at GenerateSummaryAndTimesAudioQueueAndDirectories() a list for each section in the podcast audio with the section_name and its start time in minutes:seconds.

+ Unity will generate an output directory for each section, save the transcribed text and chatgpt image description for each section, and the images generated.

<img width="400" alt="output-snapshot" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/4cb4a278-83b8-493f-b4b1-3af8d62faeb6">
<img width="400" alt="output-snapshot" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/4cb4a278-83b8-493f-b4b1-3af8d62faeb6">

* Podcast Audio Chunks: The Whisper model is designed to work on audio samples of up to 30s in duration. Hence we chunk the podcast audio for each section in chunks of max 30 seconds but load these as a queue in Whisper-tiny for each podcast section.

<img width="400" alt="audio-chunks" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/502ad067-00db-466c-8f9c-48d466021905">
<img width="400" alt="audio-chunks" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/502ad067-00db-466c-8f9c-48d466021905">

* AI Generated Images: Shown in the scene along with the transcribed text and chatgpt image description:

<img width="1000" alt="scene-progress" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/67c2f52b-6535-4f9d-a1cb-7f8372b2071a">
<img width="1000" alt="scene-progress" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/67c2f52b-6535-4f9d-a1cb-7f8372b2071a">

* Scene Control Input Variables:
* Script: ChatomicManager.cs:
* Script: TalkomicManager.cs:

+ pathToAudioFile: full path to podcast audio file. Audio file is in sync with the list of section names and start times created in coroutine GenerateSummaryAndTimesAudioQueueAndDirectories()

Expand All @@ -93,11 +93,11 @@ Read [the Chatomic app blog](https://tapgaze.com/blog/podcast-to-image-slider/)

+ Create the object and add it as property to RunChatgpt.cs component in Hierarchy object "RunChatGPT"

<img width="400" alt="scriptable-object-snap" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/d3f25819-28f3-491b-80d9-bdf1d8fa5a0e">
<img width="400" alt="scriptable-object-snap" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/d3f25819-28f3-491b-80d9-bdf1d8fa5a0e">

+ Enter credentials and request arguments

<img width="419" alt="scriptable-credentials-example" src="https://github.com/sergiosolorzano/ChatomicApp-Unity/assets/24430655/9b8ed4b2-cd5c-4f80-a100-0c20031d0f4c">
<img width="419" alt="scriptable-credentials-example" src="https://github.com/sergiosolorzano/TalkomicApp-Unity/assets/24430655/9b8ed4b2-cd5c-4f80-a100-0c20031d0f4c">

## Project Software
Unity version: Unity 2021.3.26f1. Only run on Editor, build not tested.
Expand Down

0 comments on commit 6381f4b

Please sign in to comment.