Controlling robots through natural language instructions is a complex task that requires integrating advanced AI models with robotic systems. This project aims to simplify robot control by leveraging Gemini AI, and LLaVA models to interpret and execute natural language commands, making robotic interactions more intuitive and accessible.
- Python 3.8
- ROS Noetic
- Tiago Pal robot
- Simulation environment (gazebo)
- Flask
- Google Cloud credentials for Vertex AI
- Ollama and LLaVA model
-
User Interface: A web-based interface built with Flask to input commands.
-
LLM Models: Integration of Gemini, Ollama, and LLaVA for generating and interpreting commands.
-
Robot Control: ROS-based control of the Tiago Pal robot, including movement, arm manipulation, and sensory feedback.
Install ROS Noetic on your system following the instructions from the official ROS website.
https://wiki.ros.org/Robots/TIAGo/Tutorials/Installation/InstallUbuntuAndROS
Install the Tiago Pal robot simulation packages by following the instructions from the official ROS website:
https://wiki.ros.org/Robots/TIAGo/Tutorials/Installation/Testing_simulation
Launch the Tiago Pal simulation:
roslaunch tiago_gazebo tiago_gazebo.launch public_sim:=true
Install the Vertex AI Python SDK:
pip install google-cloud-aiplatform
Set up your Google Cloud credentials:
export GOOGLE_APPLICATION_CREDENTIALS=<path_to_your_credentials_file.json>
Initialize Vertex AI:
import vertexai
vertexai.init(project="YOUR_PROJECT_ID", location="YOUR_REGION")
To install ollama
curl -fsSL https://ollama.com/install.sh | sh
Install the necessary Python packages:
pip install ollama
This project uses only llava-llama3 (but in theory you can swap in another capable VLM) to install that version code for using gemini 1.5 as a vlm has been removed temporarily:
ollama run llava-llama3
Running the Simulation and Flask App Launch the ROS simulation:
cd ~/tiago_public_ws
source devel/setup.bash
roslaunch tiago_gazebo tiago_gazebo.launch public_sim:=true
Start the Flask app:
export FLASK_APP=app.py
flask run
Open your web browser and navigate to http://127.0.0.1:5000 to access the control interface.
Use the web interface to input commands. The Flask app will process these commands using Gemini, Ollama, and LLaVA.
The robot will execute the commands, providing feedback on each action.
-
"Move forward"
-
"Pick up the object"
-
"Extend arm"
-
"Rotate head left"
Monitoring and Feedback system
The app provides real-time feedback on the robot's actions, ensuring each step is completed before proceeding to the next. Check the console output for detailed logs and any error messages.