From Hallucinations to Humility: A Systematic Pipeline for Medical AI Red Teaming

The transition of generative AI from theoretical models to production grade clinical tools requires a shift from standard performance metrics to adversarial stress testing. As healthcare institutions integrate Large Language Models (LLMs) into decision support and patient interaction, the risk of safety failures, hallucinations, and model degradation under pressure becomes a critical liability. This session explores a systematic approach to clinical red teaming, moving beyond generic jailbreaking to context specific adversarial mechanics. By leveraging insights from a comprehensive scoping review of clinical AI safety, I will present a methodology for stress testing models using noisy, high dimensional patient data.

The presentation focuses on identifying specific failure points where LLMs encounter real world clinical noise, such as inconsistent electronic health record documentation and complex multimodal imaging inputs. We will discuss the development of a systematic analysis pipeline designed to visualize the relationships between AI architecture and adversarial vulnerabilities. This framework goes beyond simple accuracy to evaluate clinical humility and intersectional equity, ensuring that models remain robust across diverse patient populations. Attendees will gain a technical roadmap for implementing red teaming protocols that prioritize clinical safety and institutional resilience in the face of unpredictable generative behaviors.

Back to speakers

About the speaker

L. Raymond Guo

Assistant Professor at West Virginia University

Dr. L. Raymond Guo is an Assistant Professor in the Department of Pharmaceutical Systems and Policy. As a transdisciplinary health scientist, Dr. Guo operates at the intersection of Artificial Intelligence, Health Outcomes, and Health Policy. His research agenda is anchored in the concept of Health Data Impact. He leverages advanced analytics, including machine learning and geospatial intelligence, to translate high dimensional health data into strategies that enhance the quality, safety, and equity of care. Dr. Guo focuses on developing intelligent health systems that can predict patient trajectories and optimize clinical decision making. His research portfolio examines the implementation of digital health technologies, the evaluation of health policies using real world evidence, and the ethical deployment of AI in diverse healthcare settings. By bridging the gap between big data and clinical application, Dr. Guo aims to address complex challenges in healthcare delivery and access.