This repo contains the official implementation for the paper A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents, published as a demonstration at the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS),
by Rajmund Nagy, Taras Kucherenko, Birger Moëll, André Pereira, Hedvig Kjellström and Ulysses Bernardet.
We present a framework for integrating recent data-driven gesture generation models into interactive conversational agents in Unity. Our video demonstration is available below:
This branch contains the Blenderbot version of our implementation with a built-in chatbot and TTS. You may visit the dialogflow_demo branch for an alternative version that integrates DialogFlow to the project for speech generation.
Please follow the instructions in INSTALLATION.md to install and run the project.
Our framework is designed to be fully modular therefore it can be applied to different voices, chatbot backends, gesture generation models and 3D characters. However, using it in a new project will require some coding for which we provide guidance below.
The source code of the Unity scene with DialogFlow integration is available on this link, while the Blenderbot version is available here. The relevant C# scripts are found in the Assets/Scripts/
folder. The entry point of the python code is the main.py
file, while the bulk of the implementation is found in gesture_generator_service.py
.
- The C# and the python scripts comunicate over ActiveMQ, as implemented in the
ActiveMQClient.cs
and themessaging_server.py
files. - Once the generated motion arrives to the 3D agent, the
MotionVisualizer.cs
file animates its model by modifying thelocalRotation
values of each joint.- There is no clear convention of how 3D models handle joint rotations in Unity. The 3D joint angles generated by Gesticulator follow the BVH format; applying them to new character models will require Unity knowledge and some tinkering.
- In the dialogflow_demo, the agent's responses are generated by DialogFlow, which is integrated in the C# script
DialogFlowCommunicator.cs
. - In the blenderbot_demo, the responses are generated with Facebook's Blenderbot and Glow-TTS as implemented in Mozilla's TTS library. See
blenderbot.py
andtts_interface.py
for details.
- We use the Gesticulator model in both demonstrations, which generates motion as 3D joint angles using speech text and audio as input.
- In order to use other models, the following have to be considered:
- 3D joint angles are necessary to animate the 3D model in Unity, therefore the gesture generation model must return the motion in that format.
- For any model, an interface must be implemented for getting the generated gestures for any speech. The
GesturePredictor
class shows how we implemented that for Gesticulator.
- StyleGestures is a good alternative model with a compatible codebase.
The authors would like to thank Lewis King for sharing the source code of his JimBot project with us.
If you use this code in your research, then please cite it:
@inproceedings{Nagy2021gesturebot,
author = {Nagy, Rajmund and Kucherenko, Taras and Moell, Birger and Pereira, Andr\'{e} and Kjellstr\"{o}m, Hedvig and Bernardet, Ulysses},
title = {A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents},
year = {2021},
isbn = {9781450383073},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
address = {Richland, SC},
booktitle = {Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems},
location = {Virtual Event, United Kingdom},
series = {AAMAS '21}
}