AI outperforms clinicians in satisfaction ratings for medical advice responses

Laypeople rated AI-generated medical responses as more satisfying than clinician responses, especially in cardiology, but empathy and quality shined in endocrinology responses.

featured-image

Researchers at Stanford University found that AI-generated responses to patient messages achieved higher satisfaction rates than clinician responses, though empathy and quality remained strong in endocrinology. Research Letter: Perspectives on Artificial Intelligence–Generated Responses to Patient Messages . Image Credit: Munthita / Shutterstock In a recent study published in JAMA Network Open , researchers at Stanford University evaluated the satisfaction of laypeople with artificial intelligence (AI) responses relative to clinician-to-patient messages.

Generative AI can potentially help clinicians respond to patients' messages. While AI-generated responses exhibit acceptable quality and a low risk of harm, the perspectives of laypersons toward AI-generated responses have been rarely explored in detail. The study and findings In this cross-sectional study, researchers investigated laypersons’ satisfaction with AI-generated responses compared to clinician-to-patient messages.



They screened 3,769,023 patient medical advice requests in health records and included 59 clinical questions for analysis. Two generative AI models were used: Stanford Generative Pretrained Transformer (GPT) and ChatGPT-4. These tools generated responses with and without prompt engineering.

For the final analysis, AI responses generated with prompt engineering were selected for higher-quality information and empathy. Six licensed clinicians investigated the original clinician responses as well as AI responses on a five-point Likert scale, with 5 indicating the best and 1 indicating the worst. Additionally, 30 participants, recruited through the Stanford Research Registry, assessed AI and clinician responses for satisfaction.

Each response was independently evaluated by three persons, with a score of 5 being extremely satisfied and 1 being extremely dissatisfied. To account for the potential biases and variability of evaluators, the researchers developed mixed models to calculate effect estimates for empathy, satisfaction, and information quality. The team used multivariable linear regression to investigate associations between response length and satisfaction, adjusting for sex, age, race, and ethnicity.

Overall, 2,118 assessments of AI response quality and 408 assessments of satisfaction were included. Notably, satisfaction estimates for AI responses (mean 3.96) were significantly higher than for clinician responses (mean 3.

05), both overall and by specialty. The highest satisfaction estimates were for AI responses to cardiology questions, whereas responses to endocrinology questions showed the highest empathy and information quality. Clinician responses were shorter, with an average of 254 characters, compared to AI responses, which averaged 1,471 characters.

Interestingly, the length of clinician responses was associated with satisfaction, particularly in cardiology questions, whereas no such association was found for AI response length. Conclusions The study assessed satisfaction with AI responses to patients’ questions in health records. The findings showed that AI-generated responses had consistently higher satisfaction than clinician responses.

However, satisfaction was not necessarily concordant with information quality and empathy, as responses to cardiology questions had the highest satisfaction, but endocrinology questions were rated highest in empathy and information quality. Further, the length of clinician responses, but not AI’s, was associated with satisfaction, suggesting that brevity in clinician-patient communication might lower satisfaction. The study’s limitations include the assessment of satisfaction by survey participants rather than by the patients who originally submitted the questions.

Thus, original patients' satisfaction might differ. Future studies should assess satisfaction with AI responses across various settings, including different medical centers, regions, patient populations, and specialties. Overall, the study underscores the importance of patients as stakeholders in developing and implementing AI in clinician-patient communications for optimal integration into practice.

Kim J, Chen ML, Rezaei SJ, et al. Perspectives on Artificial Intelligence–Generated Responses to Patient Messages. JAMA Network Open, 2024, DOI: 10.

1001/jamanetworkopen.2024.38535, https://jamanetwork.

com/journals/jamanetworkopen/fullarticle/2824919.