February 24, 2024

GPT-4 Driven Robot Revolutionizes Human-Like Gestures and Actions

Researchers at the University of Tokyo have made a groundbreaking advancement in the field of robotics by incorporating GPT-4, a large language model, into a humanoid robot named Alter3. The integration of GPT-4 allows the robot to perform various actions, such as taking a selfie, tossing a ball, eating popcorn, and even playing air guitar, with more human-like gestures and movements.

In the past, these activities would have required specific coding for each action. However, by utilizing GPT-4, researchers have unlocked new possibilities for robots to learn from natural language instruction and eliminate the need for hardware-dependent controls.

According to a recent study conducted by the team, robots powered by artificial intelligence (AI) have primarily focused on facilitating basic communication between humans and robots within a computer environment. Language models have been utilized to interpret and generate lifelike responses. However, the incorporation of GPT-4 introduces direct control of the robot’s body by mapping human actions expressed in language onto its physical movements through program code. This advancement marks a paradigm shift in the field of robotics.

Alter3, the humanoid robot powered by GPT-4, possesses intricate upper body movement capabilities, including detailed facial expressions. With its 43 axes simulating human musculoskeletal movement, Alter3 can mimic human-like movements and gestures. While it is unable to walk, the robot rests on a base and can imitate the act of walking. Coordinating the movements of the robot’s multiple joints was a complex task that required extensive coding and repetitive motions. However, with the integration of GPT-4, the researchers are now liberated from this labor-intensive process.

Instead of manually coding each movement, researchers can now provide verbal instructions that describe the desired actions. They can then prompt GPT-4 to generate Python code that runs the Android engine, enabling Alter3 to execute the instructed movements. Alter3 has the ability to retain its activities in memory, allowing researchers to refine and adjust its actions over time. This iterative process leads to faster, smoother, and more accurate movements.

To illustrate the capabilities of Alter3 guided by GPT-4, the authors of the study provide an example of natural language instructions given to the robot for taking a selfie. By simply describing the desired movements using verbal instructions, researchers can prompt the robot to position itself and capture a selfie.

The integration of GPT-4 into Alter3 represents a significant leap forward in the field of robotics. The ability to utilize natural language instructions to control robots opens up numerous possibilities for human-robot interaction and collaboration. With further advancements in language models like GPT-4, robots may become even more intuitive and capable of understanding and executing complex tasks instructed by humans. This breakthrough brings us closer to a future where robots seamlessly integrate into our daily lives, making our interactions with them more natural and intuitive.

Note:
1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraged AI tools to mine information and compile it