- Workers using OpenAI's ChatGPT may actually perform more poorly than those who don't, new BCG research finds.
- That's because some workers take the AI chatbot's outputs at face value and don't check for errors.
- BCG's findings may be a cautionary tale for workers turning to AI.
Using AI at work can actually hurt your job performance — if it's used for tasks outside of its capabilities, new research from Boston Consulting Group found.
A group of researchers from BCG, Harvard, Wharton, and MIT conducted an experiment to see how access to AI impacts white-collar workers' productivity and quality of work.
To test this, researchers randomly assigned 758 consultants at BCG across one of three groups: one with no access to AI; one with access to ChatGPT powered by GPT-4; and one with access to ChatGPT, as well as instructional videos and documents on prompt engineering strategies.
After establishing performance baselines, consultants in each group were assigned one of two categories of tasks.
One category included 18 tasks that exist "inside the frontier" of what the AI can do, like brainstorming innovative beverage concepts or coming up with a thorough business plan for a new footwear concept.
The other category contained more open-ended tasks that exist "outside the frontier" of AI's capabilities. While "consultants would excel" at these tasks, "AI would struggle without extensive guidance," the study said.
For example, consultants assigned this set of tasks were asked to offer recommendations to the CEO of a hypothetical company by using internal financial data and interviews with company insiders — information the AI didn't have access to.
Researchers found stark differences in the results of the three groups, depending on their access to ChatGPT.
For tasks "inside the frontier," consultants using AI were "significantly more productive" and "produced significantly higher quality results" than those who weren't using the chatbot.
However, consultants using AI to complete tasks "outside the frontier" were "19 percentage points less likely to produce correct solutions compared to those without AI." That's because the consultants with AI were found to indiscriminately listen to its output — even if the answers were wrong.
These findings demonstrate AI's "uneven" capabilities.
While the study's findings show that AI is "exceedingly good" at helping humans with some tasks, humans should exercise caution when using the technology to avoid errors, Saren Rajendran, one of the researchers involved in the study, told Insider in an email.
"We should be mindful when using GenAI," he added.
BCG's findings demonstrate a cautionary tale for workers thinking about using ChatGPT to help do their jobs. Since ChatGPT came out last November, workers across industries have been using the AI chatbot — sometimes without telling their bosses — to develop code, create marketing materials, and generate lesson plans.
However, ChatGPT's outputs aren't perfect and can contain "hallucinations."
Tech publicationCNET was put on blast earlier this year after readers noticed that a number of its AI-generated articles included factual errors.
As of September 28, media watchdog NewsGuard has identified 487 "unreliable" AI-generated news sites with "little to no human oversight."
In an ad for Google's Bard, the AI chatbot made a factual error when asked about the James Webb Space Telescope.
AI-generated errors may only get worse: In a recent paper, AI researchers found that generative AI models could soon be trained on AI-generated content — a phenomenon they call "model collapse." The result could be more low-quality outputs in the near future.
"As the boundaries of AI capabilities continue to expand, often exponentially, it becomes incumbent upon human professionals to recalibrate their understanding of the frontier and for organizations to prepare for a new world of work combining humans and AI," the researchers wrote.