Online reviews serve as a guide for consumer choice. With advancements in large language models (LLMs) and generative AI, the fast and inexpensive creation of human-like text may threaten the feedback function of online reviews if neither readers nor platforms can differentiate between human-written and AI-generated content. In two experiments, we found that humans cannot recognize AI-written reviews. Even with monetary incentives for accuracy, both Type I and Type II errors were common: human reviews were often mistaken for AI-generated reviews, and even more frequently, AI-generated reviews were mistaken for human reviews. This held true across various ratings, emotional tones, review lengths, and participants’ genders, education levels, and AI expertise. Younger participants were somewhat better at distinguishing between human and AI reviews. An additional study revealed that current AI detectors were also fooled by AI-generated reviews. We discuss the implications of our findings on trust erosion, manipulation, regulation, consumer behavior, AI detection, market structure, innovation, and review platforms.
“Interestingly, this effect cannot be explained by differences in participants’ experience with generative AI models, as that variable is insignificant in the mode”
When predictors are correlated, which is most likely the case here, this analysis cannot separately estimate their effects. The software will end up splitting the total effect size between the two predictors. Without describing collineariry between predictors, it’s not possible here to judge whether experience with AI is truly unimportant or the analysis is merely incapable of spotting the effect.
As for eroding confidence in reviews, this will make it worse, but I already put next to no stock in user reviews anymore. You don’t need AI to make a good human-like review that lies about a product, and there are plenty of those around.
“Interestingly, this effect cannot be explained by differences in participants’ experience with generative AI models, as that variable is insignificant in the mode”
When predictors are correlated, which is most likely the case here, this analysis cannot separately estimate their effects. The software will end up splitting the total effect size between the two predictors. Without describing collineariry between predictors, it’s not possible here to judge whether experience with AI is truly unimportant or the analysis is merely incapable of spotting the effect.
As for eroding confidence in reviews, this will make it worse, but I already put next to no stock in user reviews anymore. You don’t need AI to make a good human-like review that lies about a product, and there are plenty of those around.