How do you teach human interaction to a robot? Lots of TV

In this Wednesday, June 22, 2016, photo, Massachusetts Institute of Technology researcher Carl Vondrick looks through a protective door while standing next to a computer server cluster, right, on the MIT campus, in Cambridge, Mass. MIT says a computer that binge-watched TV shows such as “The Office,” “Big Bang Theory” and “Desperate Housewives” learned how to predict whether the actors were about to hug, kiss, shake hands or slap high-fives, a breakthrough that eventually could help the next generation of artificial intelligence function less clumsily. (AP Photo/Steven Senne)

CAMBRIDGE, Mass. (AP) — Remember the Jetsons’ robot maid, Rosie? Massachusetts Institute of Technology researchers think her future real-life incarnations can learn a thing or two from Steve Carell and other sitcom stars.

MIT says a computer that binge-watched YouTube videos and TV shows such as “The Office,” ”Big Bang Theory” and “Desperate Housewives” learned how to predict whether the actors were about to hug, kiss, shake hands or slap high fives — advances that eventually could help the next generation of artificial intelligence function less clumsily.

“It could help a robot move more fluidly through your living space,” lead researcher Carl Vondrick told The Associated Press in an interview. “The robot won’t want to start pouring milk if it thinks you’re about to pull the glass away.”

Vondrick also sees potential health-care applications: “If you can predict that someone’s about to fall down or start a fire or hurt themselves, it might give you a few seconds’ advance notice to intervene.”

The findings — two years in the making at MIT’s Computer Science and Artificial Intelligence Laboratory — will be presented at next week’s International Conference on Computer Vision and Pattern Recognition in Las Vegas.

Vondrick, a doctoral candidate focusing on computer vision and machine learning with grants from Google and the National Science Foundation, worked with MIT professor Antonio Torralba and Hamed Pirsiavash, now at the University of Maryland. The trio wanted to see if they could create an algorithm that could mimic a human being’s intuition in anticipating what will happen next after two people meet.

To refine what’s known in artificial intelligence studies as “predictive vision,” they needed to expose their machine-learning system to video showing humans greeting one another.

Cue what Vondrick acknowledges were “random videos off YouTube.” Six hundred hours of them, to be precise.

The researchers downloaded the videos and converted them into visual representations — a sort of numerical interpretation of pixels on a screen that the algorithm could read and search for complex patterns.

They then showed the computer clips from TV sitcoms it had never seen before — interactions between “Big Bang Theory” stars Jim Parsons (Sheldon Cooper) and Kaley Cuoco (Penny), for example — and asked the algorithm to predict one second later whether the two would hug, kiss, shake hands or high-five.

The computer got it right more than 43 percent of the time. That may not sound like much, but it’s better than existing algorithms with a 36 percent success rate. Humans make the right call 71 percent of the time.

In a video trailer of the study that showed the algorithm blowing it on a clip from “The Office,” the researchers quipped: “So it’s not perfect … still a long way to go.”

That likely will involve even more binge-watching. Six hundred hours of video sounds like a lot, but it’s not really that much. By the time we’re 10 years old, we’ve logged nearly 60,000 hours of waking-hours experience.

“Humans are really good at predicting the immediate future,” Pirsiavash, the team member now based in Baltimore, said Wednesday. “To have robots interact with humans seamlessly, the robot should be able to reason about the immediate future of our actions.”

Martial Hebert, director of the robotics institute at Carnegie Mellon University in Pittsburgh, who was not involved in the MIT study, called it “an important work.”

“Some argue that prediction is a central part of (artificial) intelligence,” Hebert said. “If you have a robot that can predict, you can map a deeper and more complicated understanding of the environment around it.”

The researchers’ biggest relief? The computer did all the binge-watching.

“We never had to watch the videos,” Vondrick said. provides commenting to allow for constructive discussion on the stories we cover. In order to comment here, you acknowledge you have read and agreed to our Terms of Service. Commenters who violate these terms, including use of vulgar language or racial slurs, will be banned. Please be respectful of the opinions of others. If you see an inappropriate comment, please flag it for our moderators to review. Note: Comments containing links are not allowed.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s