Tony Stark, a.k.a Iron Man, is a genius with an incredibly strong, rocket-powered suit. But he’s not alone.
Indeed, the Marvel hero has an active collaborator, J.A.R.V.I.S — a form of artificial intelligence helping manage Stark Industries, giving Tony advice during battle.
But does Tony actually listen to J.A.R.V.I.S.? And for that matter, how well do humans collaborate with AI in general?
That question is at the heart of new, provocative research by USC Viterbi’s Information Sciences Institute (ISI) scientists Andres Abeliuk, Daniel M. Benjamin, Fred Morstatter, and Aram Galstyan, director of ISI’s Artificial Intelligence Division and PI of the project.
Given how prevalent the role of collaborative AI is in our lives, the team set about asking some very important questions in their article featured in Nature‘s Scientific Reports journal, “Quantifying Machine Influence Over Human Forecasters.” Do humans trust AI assistants? If so, when? If not, why? Do human-AI collaborations fare better than AI or human intuition alone?
Incremental collaborations between humans and AI are already evolving, such as with self-driving cars. “The dialogue around [self-driving cars] acts as if it’s an all-or-nothing proposition,” said Benjamin. “But we’ve slowly been acclimated to automation in cars for years with automatic transmissions, cruise control, anti-lock brakes, etc.”
To quantify such human-machine relationships, we need a lot of humans willing to try to collaborate with AI in a lab environment. This is exactly what the ISI team is working on: SAGE, short for Synergistic Anticipation of Geopolitical Events, where laypeople collaborate with AI tools to predict the future. Non-experts accurately predicted last April that North Korea would launch its missile test before July, which it did, indicating their potential to generate accurate forecasts.
Now, SAGE and the possibilities of human-AI collaboration are being put to the test: in the Intelligence Advanced Research Projects Agency (IARPA) Hybrid Forecasting Competition (HFC). In 2017, ISI received a four-year, multi-million IARPA grant to develop a human-machine hybrid forecasting system. The ISI team was one of three chosen by IARPA.
“SAGE aims to develop a system that leverages human and machine capabilities to improve upon the accuracy of either type on its own,” Benjamin said. “This Hybrid Forecasting Competition (HFC) provided a unique setting to study how people interact with computer models. [Other] studies typically involve one-off or short-term participation—the HFC recruited participants to provide forecasts for many months.”
J.A.R.V.I.S.: Sir, please, may I request just a few hours to collaborate?
Tony Stark: Nope! Micro-repeater implanting sequence complete!
J.A.R.V.I.S.: As you wish, sir. I’ve also prepared a safety briefing for you to entirely ignore.
Tony Stark: Which I will.
HFC users participated week-in and week-out on questions that were open for weeks. Hundreds of participants of varying demographics voluntarily signed up for answering such predictive questions as: How many earthquakes of magnitude 5 or stronger will occur worldwide in a given month? What will be the closing price of gold on a given date? Some participants were exposed to AI predictions while others were not. Participants were free to choose whether to rely on AI predictions or to go with their intuitions.
So what’s the verdict? Do human-AI collaborations beat humans alone? Yes, they do, and this hybrid team also beats an AI that’s working alone!
“At the start of the HFC,” Morstatter explained, “some of our teammates thought it was a foregone conclusion that the machine models would outperform the human forecasters—a hypothesis proven false.”
It turns out that the ISI team was in for quite a few surprises. “Our key finding was that users used the statistical models more rarely than we anticipated, in a pattern that resembled how people use human advice,” said Abeliuk. “We expected many instances where forecasters over-relied on the models. Instead, we found people over-relied on their personal information. Forecasters readily dismissed the model prediction when it disagreed with their pre-existing beliefs (known as confirmation bias).”
Despite evidence that listening to the AI helped overall, people couldn’t optimally heed its suggestions. “Overall, the addition of statistical models into a forecasting system did improve accuracy,” Benjamin said. “However, it shouldn’t be a foregone conclusion that humans will use the tools well or at all.”
This has huge implications. It’s not enough to design a tool that succeeds at a task if the tool isn’t used well. What good is a driver-assist technology if the person behind the wheel doesn’t oblige with the system’s requests to slow down when there’s a curve up ahead? “To optimize a human-computer system, trust in machines must be earned,” remarked Abeliuk. “[And] trust in machines, much like trust in other humans, is easily lost.”
The implications aren’t merely for engineers who build such AI tools, but also for customers who use them. “The average person should learn to be more deliberate in how they interact with new technology,” Abeliuk said. “The better forecasters in our study were able to determine when to trust the model and when to trust their own research—the average forecaster was not.”
Published on March 2nd, 2021
Last updated on April 29th, 2022