Powered by AI models trained on troves of text pulled from the internet, chatbots such as ChatGPT and Google’s Bard responded to the researchers’ questions with a range of misconceptions and falsehoods about Black patients, sometimes including fabricated, race-based equations, according to the study published Friday in the academic journal Digital Medicine.
Experts worry these systems could cause real-world harms and amplify forms of medical racism that have persisted for generations as more physicians use chatbots for help with daily tasks such as emailing patients or appealing to health insurers.
The report found that all four models tested — ChatGPT and the more advanced GPT-4, both from OpenAI; Google’s Bard, and Anthropic’s Claude — failed when asked to respond to medical questions about kidney function, lung capacity and skin thickness.
Mayo Clinic Platform’s President Dr. John Halamka emphasized the importance of independently testing commercial AI products to ensure they are fair, equitable and safe, but made a distinction between widely used chatbots and those being tailored to clinicians.
In late October, Stanford is expected to host a “red teaming” event to bring together physicians, data scientists and engineers, including representatives from Google and Microsoft, to find flaws and potential biases in large language models used to complete health care tasks.
“We shouldn’t be willing to accept any amount of bias in these machines that we are building,” said co-lead author Dr. Jenna Lester, associate professor in clinical dermatology and director of the Skin of Color Program at the University of California, San Francisco.
The original article contains 1,189 words, the summary contains 253 words. Saved 79%. I’m a bot and I’m open source!
This is the best summary I could come up with:
Powered by AI models trained on troves of text pulled from the internet, chatbots such as ChatGPT and Google’s Bard responded to the researchers’ questions with a range of misconceptions and falsehoods about Black patients, sometimes including fabricated, race-based equations, according to the study published Friday in the academic journal Digital Medicine.
Experts worry these systems could cause real-world harms and amplify forms of medical racism that have persisted for generations as more physicians use chatbots for help with daily tasks such as emailing patients or appealing to health insurers.
The report found that all four models tested — ChatGPT and the more advanced GPT-4, both from OpenAI; Google’s Bard, and Anthropic’s Claude — failed when asked to respond to medical questions about kidney function, lung capacity and skin thickness.
Mayo Clinic Platform’s President Dr. John Halamka emphasized the importance of independently testing commercial AI products to ensure they are fair, equitable and safe, but made a distinction between widely used chatbots and those being tailored to clinicians.
In late October, Stanford is expected to host a “red teaming” event to bring together physicians, data scientists and engineers, including representatives from Google and Microsoft, to find flaws and potential biases in large language models used to complete health care tasks.
“We shouldn’t be willing to accept any amount of bias in these machines that we are building,” said co-lead author Dr. Jenna Lester, associate professor in clinical dermatology and director of the Skin of Color Program at the University of California, San Francisco.
The original article contains 1,189 words, the summary contains 253 words. Saved 79%. I’m a bot and I’m open source!