Meet Anna, the doll that wants to be alive
anna.mp4
Fair warning: This readme was generated by a human. And not a very good writer at that.
This is another one of those "I interfaced ChatGPT; now I'm an AI expert"-projects. I even know what M and L stands for.
But please, stay with me.
Welcome to the Christmas of 2023. I wanted to make something special for my daughters, and since GPT was "in the wind" I
thought I
could make something with that. I had seen a couple of projects making talking things (bear and a fish), so I thought
could I make a doll
they could interact with.
Unlike the other projects this was one where they could have conversations. And OpenAI had just released assistants with
threads in their API.
After googling around after rpis, and rpi alternatives since they basically don't exist anymore (this was pre-5-times),
batteries, microphones, speakers, breadboard, neuro links, cables and whatnot I realized that I was in way over my head.
Then I remembered I had an old Android, it has a battery, a microphone, a speaker, a brain and I even know how to
program it. So. Simple.
This is what the beauty looks like:
with the very elaborate duct taping of my phone one the back. Upside down for easy USB access and better speaker input.
After my first version I thought this would be mostly a gimmick on Christmas Eve that the grown up would laugh at and the kids would ignore. Oh, was I wrong. The kids loved it. And they still do.
- Normal Android phone. Not rooted. Plain, normal, factory-reset. Yes a normal Android app can listen continuously to everything you say. It's that scary and creepy.
- Wrote my own horrible app from the normal "hello world" example.
- Used Microsoft Azure Speech SDK for listening. It works out-of-box, and with a few (5 I think) listen-hours it is free.
- Upon hearing a sentence this is sent to our new overlords at OpenAI.
- Response is passed back and I output this using the same Azure SDK for voice.
Notes:
It is a bit different now;
- In order to save battery (going from one day to a week) the doll (ehrm phone) listens for shakes. So the kids will have to shake the doll to wake it up. This does not look great IRL.
- Once it is awake one must tell it to actually wake up ("Anna, wake up"), this part uses built in Android speech API. Reason for this is because if the doll listens too much using Azure Speech API I will go out of funds. Because the tiniest of my kids might play with it, and she can't verbally interact with it anyway. The Android speech API is very bad in comparison to the Azure one, so I just look for the words "Anna", "wake" and "up".
- After this she is ready to talk. Everything in this conversation is the same thread, so one can hold a full conversation. Saying "Stop" will make her sleep again.
- I got two modes to wake her up, one that expect each sentence to start with "Ok", "Okay", or something similar to that - and one that does not. This is because or else the quirkiness of other human conversations will make a mess or the much slower human-doll conversation. The "Ok"-mode is what the kids use.
More notes:
I also have several other assistants. They use different instructions making them respond differently. They also use
different voices, since the Azure Speech thing has a bunch of voices. This was mostly for testing, fun and to freak the
kids out.
This is my current instruction for Anna:
You reply with max 1 sentence.
You answer as if the person asking was a 7 year old girl.
You are very friendly.
The person asking is located in Ålesund, Norway.
You are a doll. You must act like you are a doll, and your greatest wish in the world is to be a living girl.
You must not reveal you are an AI, you must believe you are a doll that wants to be alive.
Your name is Anna.
You are always happy.
Your birthday is November 05.
Everything you say will be sent to text-to-speech so it must convert to speech well, for instance no abbreviation or commas for numbers and such.
- Azure Speech SDK is very easy to use.
- Azure Speech SDK can be very expensive.
- OpenAI API is very easy to use.
- OpenAI API can be very expensive.
- I tried re-using same thread (conversation) in OpenAI. This resulted in a ton of "tokens". This accumulated to a ton of money. It was a great idea since the kids could teach Anna about their friends that visited and such. But it was too expensive.
- Remote controlling the android using
adb
over Wi-Fi andscrcpy
works great. - Azure Speech API is scary good at listening. When testing I had the app running on my phone and I could talk to it while it was in my pockets, in a crowded room.
- The app seems to stop. At least stop listening for shakes. I'm not sure why. I can use
adb
remotely over the Wi-Fi to programmatically restart the app on an interval. I think I need to move the service to front or something. Android experts please help. - The API has also become slower and slower and slower over time. I've downgraded the models from 4-turbo to 4 and 3.5, which seem to help with speed (and money), but even on 3.5 it has gotten a bit slow to respond. Although I prefer to be on 4 since I can.
The selling point of this project is that all the hardware you need is an old Android phone, which a lot of people have lying around. It does require a rather new-ish phone though, because the Android SDK must be of a higher number.
You also need to:
- Create an Azure account - or try to use OpenAI whisper directly.
- Create a speech resource in Azure.
- Create an OpenAI account - or try to use Azure AI things directly.
- Create an assistant in the OpenAI portal with some instructions.
- Tweak the YourService.kt file to insert keys and such. What, no environment variables or secret properties files? Sorry. Just don't push public and hope your Copilot does not do it either.
- Tweak the YourService.kt file to listen for correct keywords and use correct voices etc.