Imagine being able to converse with a version of yourself that can provide answers to any question you may have. This project aimed to achieve just that!
You can now engage in a conversation with yourself and seek answers to questions that may have previously eluded you.
Using cutting-edge machine learning techniques such as NLP, voice cloning, computer vision, etc., we have created a virtual version of you that can seamlessy communicate with you. This AI-powered version of yourself is capable of understanding the nuances of human language and can provide insightful and accurate responses to your queries.
Upon receiving the input prompt, the program initiates the answer generation process. Subsequently, it proceeds to extract relevant keywords and phrases from the generated response. Also the program summarize the generated answer to improve the quality of the final text. After obtaining the final answer, the Styleformer feature is applied, allowing you to switch between formal and casual styles seamlessly.
To enhance the understanding of the content, the program creates two images: the first image is generated using the extracted keywords, while the second image utilizes the extracted phrases from the generated text. Additionally, leveraging the aforementioned keywords and phrases, the program generates two videos. Next, employing voice cloning techniques and utilizing a dataset of your recorded voice in .wav
format, the program synthesizes the text into speech, employing your unique vocal characteristics. Furthermore, by utilizing your .jpg
format, the program automatically generates a talking-head video, employing your input image as a base, and synchronized with the cloned voice.
So, whether you're seeking advice on a personal matter, looking for guidance in your career, or simply curious about the world around you, this AI-powered version of yourself is always at your disposal.
With this project, the possibilities are endless. Your AI-powered virtual self is ready for conversation at any time.
Welcome to the Features section!
We've numbered the features below to highlight their specific order and importance. This helps you understand the logical progression of our product's capabilities, making it easier to grasp the underlying flow and prioritize accordingly. Enjoy exploring the exciting features we have in store for you!
- Text generation powered by fine-tuned GPT-2
- Keyword Extraction with KeyBERT
- Text summarization
- Styleformer Integration: Seamless Style Switching in Conversations
- AI Text-to-Image with minimal DALL-E Mini
- Text-to-video with Diffusers
- Voice Cloning
- MakeItTalk: Speaker-Aware Talking-Head Animation
- Python 3.x
- Google Colab or Jupyter
- As this is a Jupyter Notebook file, It's not OS-specific and can be run on any operating system that supports Python, including Windows, macOS, and Linux.
- If you want to convert the Jupyter notebook (
.ipynb
) into a standalone web application with a user interface (UI), you can useAnvil
.
- Clone the repository:
git clone https://github.com/Amirrezahmi/SelfTalker.git
- However I've told everything in
Self_Talker.ipynb
, but let me summarize it here one more time! First of all download my fine-tuned GPT-2 model from here or if you have your own dataset and you want to fine-tune your own GPT-2 model, I've provided my both jupyter notebok and dataset infine-tune GPT2
directory. This folder contains my code which used to fine-tune GPT-2 on my own dataset. - The sixth step of
Self_Talker.ipynb
is related to voice cloning, so a dataset of your voice is required. Please note that the files should be formated in.wav
. Ten files of 10-15 seconds are enough for your dataset. - The last step of
Self_Talker.ipynb
asks you for a$256 \times 256$ picture of yourself which is formated in.jpg
file. - After going through the previous steps, it's time to run cell's from the
Self_Talker.ipynb
jupyter note book. All you need to run this project is a colab or jupyter account. By running "requirements" section from this notebook, you will simply install all requirements. Please note that you should change your run time type to GPU and import the fine-tuned GPT-2 model or even the voice dataset of yours if you're using google colab.
Here's a sample example:
If you want to be rich, you need to have a lot of money. If you don't have enough money, it will be difficult for you to achieve your goals, because you will not be able to buy the things that you really want. The only way to make money is by investing. Investing in stocks, bonds, real estate, and other assets is one of the best ways to get rich. There are many methods of investing, but the most important is to invest your money in the stock market. For example, if you invest $ 10,000 in a mutual fund, the fund will give you a return of 10% per year for the next 10 years. This is called a compound annual growth rate (CAGR). The more money you put into a fund the greater the return.
If you want to be rich, you need to have a lot of money. The only way to make money is by investing. Investing in stocks, bonds, real estate, and other assets is one of the best ways to get rich. For example, if you invest $10,000 in a mutual fund, the fund will give you a return of 10% per year for the next 10 years.
NOTE: The final text comprises the generated response combined with our summarized text.
NOTE: Using Styleformer we convert our generated response and summarized text from formal to casual language style. Please note that the genereated text from our fine-tuned GPT-2 model is primarily formal in nature. Visit
Self_Talker.ipynb
for more details.
- Changing generated response language style to casual:
if you want to be rich, you have to have money. You will have a hard time achieving your goals if you're not rich enough to buy what you really want. Investing is the only way to make money. investing in stocks, bonds, real estate and other assets is one of the best ways to get rich. there are many ways to invest, but the most important is to invest your money in the stock market. like if you invest $ 10,000 in a mutual fund, that mutual will return you 10% each year for the next 10 years. it's called a compound annual growth rate. the more money you put into a fund the better the return.
- Changing summarized text language style to casual:
if you want to be rich, you have to have money. You can only make money by investing. Investing stocks, bonds, real estate is one of the best ways to get rich. like if you invest $10,000 in a mutual fund, the fund will give you a return of 10% per year for the next 10 years.
NOTE: In this example, I continued with the formal text but in your case it's all up to you.
nn.mp4
vidm.mp4
In this remarkable process, we have successfully cloned Elon Musk's voice using a dataset of his own recordings. The result of our text-to-speech synthesis can be found in the following .mov
file:
elon.mov
Our text here is the abridged text of the generated response.
download.2.mp4
download.3.mp4
Contributions are welcome! If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch: git checkout -b my-new-branch.
- Make your changes and commit them: git commit -m 'Add some feature'.
- Push to the branch: git push origin my-new-branch.
- Submit a pull request.
This project is licensed under the MIT License.
- GPT-2- For Text generation step.
- KeyBERT- For Keyword Extraction step.
- Styleformer- For Style Switching in Conversations.
- DALL·E Mini- For AI Text-to-Image step.
- Diffusers- For Text-to-Video step
- TorToiSe- For voice cloning step.
- MakeItTalk- For Speaker-Aware Talking-Head Animation step.
For any questions or inquiries, please contact amirrezahmi2002@gmail.com