#7. Create a lip-syncing character that responds to your messages by talking!

Create your own talking character that can respond to text-based messages by talking back to you: image-generation + text-generation + text-to-speech + lip-syncing!

Feb 01, 2024

Helloooo AI Alchemists!!! 🤖🧪

Ever since sharing this project live on LinkedIn, I have gotten a TON of requests to make lip-syncing chatty avatars for a wide range of use-cases. This is incredibly popular, and there is a crazy amount of potential for extending and monetising this.

You can go deep on custom personalities, extended conversation memory, interactions with the real world, multiple character conversations, avatar-based FAQs and SO much more.

In this newsletter, I’ll teach you how to generate a custom character on the fly that will respond to your messages by talking. You can take it from there 💖🎁

Let’s dive into this weeks GenAI MVP Project!!!

🎉 🥳 🥰

📣 Celebration Time !!

Something magical happened the other day. One of you magical readers paid for a subscription to this newsletter. I didn’t even have payments set up.

Thank you 💖 😭🥰😭🎉

Also, Fairylights now has a shiny new website as a go-to for live projects and newsletters!

GenAI project 5/100: Chatty Character!

Introducing Chatty Character!

This was an utterly joyful project to build, just, eep! 🥳

Take a look at this 👀😄

🚀 Try out the live version here!

Describe a character through text and get back a custom 3D avatar character you can chat with. You can message them via text, and they will respond back to you by talking! The voice matches character gender too (male, female, other).

Wanna make this? Let’s goooo! 🚀

How to build a character avatar that chats back!

Generate a custom character image.
Create a simple text-based chatbot.
Send a message to the chatbot and get back a text-based response.
Convert the response to speech.
Turn the character image and speech file into a talking avatar.

Step 1: Generate a custom character avatar

Generating custom characters is a silly amount of fun. Unfortunately, even the best lip-syncing software isn’t very good at handling non-human characters. I tried my very best to make this adorable baby dinosaur speak, but alas it wasn’t possible, yet!

Good golly gumdrops how cute is that!? 🥰

Here is the character I generated as the base of this lip-syncing project. I forgot to save the original image, which was much crispier than this, so I had to take a screenshot from the video.

This is the complete code I used to generate this image in Python, using OpenAI’s DALLE-3 model. You can access a copy and paste version here.

Github Gist (raw code): Generate a character image with OpenAI’s DALLE 3 model.

In the live MVP version, you can create your own character by entering character details with your own appearance and clothing descriptions. It’s addictive.

Step 2: Create a simple text-based chatbot

Next, we need to create a simple text-based chatbot. For this project, I created a bare-bones chatbot that is basically a clone of ChatGPT except it knows what the character looks like, and the responses are limited to 2-3 sentences only.

Conversation example 1: Tell me a dinosaur joke

input: Please tell me a dinosaur joke.
output: Sure, here is a dinosaur joke: Why don’t you ever hear a pterodactyl go to the bathroom? Because the ‘P’ is silent!

Conversation example 2: What are you wearing?

input: What is that you’re wearing?
output: I’m wearing a green hoodie that has a cute dinosaur on the front.

You can access a copy-paste version of the code here.

Github Gist (raw code): Generate a simple text-based chatbot with OpenAI’s GPT-3.5-Turbo model (responds in 1-3 sentences)

This chatbot doesn’t remember any of the conversation history besides the most recent response. There is so much potential here for creating an incredibly custom character personality (I’ve written another newsletter on how I did that for Harry Potter characters), remembering past conversations, and more complex behaviours.

I didn’t do any of that for this one because the aim was to show you how to create a bare-bones lip-syncing character you can extend and build on top of.

Step 3: Convert text to audio

Converting text to audio is super easy with OpenAI’s Text-To-Speech model. It’s cheap and cheerful and has 6 voice options. If I were to make a production version of this lip-syncing avatar, I’d personally go with ElevenLabs. They have a huuge library of voice options, and you can use it to clone your own voice too. Very cool.

The following code snippet shows two methods. The first chooses a male, female or gender neutral voice, and the second uses that voice to produce an audio file speaking the text you provide out loud.

You can access a copy-paste version of the code here.

Github Gist (raw code): **Convert text into speech with OpenAI’s TTS model**

Step 4: Create lip-syncing video from image and audio

This is where the magic happens. Once you have a humanoid avatar image and an audio file of someone speaking, you can use those to create an lip-syncing avatar that appears to speak the words in the recording.

I used a free lip-syncing tool called GooeyAI to create the lip-syncing avatar in the demo video. The more human-like and 3D the character is, the better the lip-syncing results.

However, if I were to build a production version of this use-case, I would go with HeyGen or D-ID, which are best-in-class lip-syncing tools, but also a little pricey.

The following method uses the GooeyAI API to generate a talking avatar video URL from an image and an audio file. In this example, we are getting the image from local filepaths, but there are also options to pass in image and audio URLs instead if you’d rather use some hosted on Vimeo or Imgur or something like that.

Github Gist (raw code): **Generate talking avatar video URL from image & audio.**

Step 5: Chain it all together

Generate a custom character image.
Create a simple text-based chatbot.
Send a message to the chatbot and get back a text-based response.
Convert the response to speech.
Turn the character image and speech file into a talking avatar.

Production Tips and Tricks

If I were going to build a production version of this, I would:

ElevenLabs for the amazing variety of voices and the ability to clone your voice.
HeyGen or D-ID for the lip-syncing, which produces much higher quality results than the free API I used for this proof-of-concept MVP.
GPT-4 for the chat conversation, and I’d also put a lot more effort in designing the character personality than I did for this one. I’d also extend it to add conversation history, and extend the functionality with knowledge retrieval and real-world action capabilities (like scheduling, booking, retrieving documents etc).
I’d look into streaming the text responses from GPT, the text to speech and the lip-syncing generation to speed everything up and get it as close to real time as possible. I will probably end up doing all of this because of how much I love this project.

Some cool use-cases

Here are some of my favourite use-cases for lip-syncing avatars that I’ve tried so far!

This is not actually me speaking

This is a lip-syncing avatar I made of myself using HeyGen for the lip-syncing and Elevenlabs for cloning my voice.

Quick and snappy product demos

This is a little screencast with avatar presenter I made with the help of Canva for putting the screencast and avatar video all together.

Subscriiiibe message

This is a little subscribe message I shared on LinkedIn. I got 15 new subscribers in an hour of posting this. Suuuper happy with that result given how early days this newsletter is (pre-100 subs).

✨ Fairylights | 100 GenAI Projects

Discussion about this post