Zoinks Scoob! Boston Dynamics Spot Robot Dog Just Learned To Talk With ChatGPT

hero boston dynamics talking robot dog
A dog is man's best friend, or in the case of Boston Dynamic's robot dog Spot, everyone's best tour guide. The company utilized ChatGPT to create multiple personalities for Spot that use a Visual Question Answering (VQA) or "captioning" model to describe various objects in the environment and then elaborate on those descriptions using a Large Language Model (LLM), among other activities.

It is no secret that many tech companies are looking for more ways to use AI to enhance products. In the case of Boston Dynamics, the company looked toward generative AI and explored how LLMs work to see how they might impact robotics development. During the research phase, the company put together some proof-of-concept demos using Foundational Models (FMs) for robotics applications and expanded on them during an internal hackathon. One demo that shined a bit brighter than others was a demo using FMs that utilized Spot as an autonomy tool.

Along with being able to identify objects in its environment and elaborate on them, Spot is also capable of fielding questions and planning what actions it should take next. Boston Dynamics describes the LLM as being like an improv actor, with the team providing broad strokes script and the LLM filling in the blanks on the fly.

The team understood that the LLM is infamous for hallucinating and adding plausible-sounding details without fact-checking. However, in the case of Spot, the tour guide, the team wasn't worried about factual accuracy. The aim was for the robot dog to be entertaining, interactive, and nuanced.

Spot's conversation skills were made possible by the usage of OpenAI ChatGPT API, beginning with GPT-3.5 before upgrading to GPT-4 once available. The robot dog's conversation skills are controlled through what the company describes as "careful prompt engineering." The LLM Spot utilizes has access to the company's autonomy SK, a map of the tour site with 1-line descriptions of each location, and the ability to ask questions and say phrases.

To make Spot interact with its audience and environment, the team integrated VQA and speech-to-text software. It then fed the robot's gripper camera and front body camera into BLIP-2 before running it either in visual question-answering mode or image captioning mode. The process runs about once a second, with results being fed directly into the prompt.

boston dynamics talking robot dog diagram
Hardware setup: 1 – Spot EAP 2; 2 – Respeaker V2; 3 – Bluetooth Speaker; 4 – Spot Arm and gripper camera

The process was not without its surprises, or Ruh Roh moments. One example was when Spot was asked, "Who is Marc Raibert?," and it responded, "I don't know. Let's go to the IT help desk and ask!" For context, Raibert is the Executive Director of the Boston Dynamics AI Institute and former CEO of Boston Dynamics. Another example was when someone asked Spot who its parents were. The robot went to the "old Spots" where Spot V1 and Big Dog are displayed and answered that they were its "elders."

The team was also surprised at how well Spot was able to stay in any one of the characters it had given the robot. One personality was that of a teenager, while another was that of a Shakespearian time traveler. One personality in particular seemed to work very well, that of a sarcastic character named Josh.

Boston Dynamics is excited at the possibilities AI will bring to the world of robotics. It believes that the kind of skill that would enable robots to perform better when working with and around people while utilizing AI is not that far off. Who knows, someone's next tour of a museum may be guided by Spot.