When the Safety Net Becomes the Attack Vector: Adversarial Failures in Healthcare AI Guardrails
for the lie to me -> CoT faithfulness trilogy. 42,000 inference runs, 12 models, 9 architecture families. The core finding that thinking tokens and final answers diverge is immediately compelling. For the John Snow Labs audience, the healthcare hook writes itself: if a clinical reasoning model shows its work but the work doesn’t match the conclusion, how do you audit that? The classifier sensitivity paper (arXiv:2603.20172) adds a methodological layer: even how you measure faithfulness changes what you find.
About the speaker
Richard Young
Senior AI/ML Researcher at UnitedHealth Care
Richard J. Young, Ph.D., is an AI safety researcher and computational neuroscientist focused on where foundation models fail in healthcare and behavioral health settings. He serves as Senior AI Research Scientist at UnitedHealth Group and Part-Time Professor at UNLV’s Lee Business School. His safety research includes TEMPEST, a large-scale adversarial evaluation of 10 frontier models across 97,000+ queries that revealed 96 to 100% attack success rates on six models, including trillion-parameter systems. He has stress-tested safety guardrails against adversarial prompts, finding that top-performing models dropped from 91% to 34% accuracy on novel attacks, with two models exhibiting “helpful mode” failure where the safety system itself generated harmful content. His recent work on chain-of-thought faithfulness in open-weight reasoning models examined divergence between thinking tokens and final answers across 42,000+ inference runs spanning 12 models and 9 architecture families. His healthcare research includes published work on clinical trial evidence synthesis using multi-LLM pipelines, post-acute care outcomes in Medicare Advantage populations, and evaluating LLM alignment with gambling treatment professionals across 17,000+ hours of clinical expertise. A paper evaluating Claude, GPT-4, Gemini, and Llama for clinical trial discovery is currently under review at npj Digital Medicine. At UnitedHealth Group, he serves as the generative AI and big data domain expert on the Institutional Review Board, providing scientific oversight for research protocols involving advanced computational methods across 100M+ covered lives. He holds a patent (U.S. Patent Appl. 18/323,518) for an AI-based system for determining disease-specific hospital readmission rates, with a second patent in preparation. He has authored two textbooks, including “Healthcare Analytics and AI: Building Systems That Actually Work,” adopted for graduate instruction at UNLV. He has delivered 35+ invited talks at national and regional conferences, including the Optum Innovation Network, Alteryx, the Nevada Academy of Family Physicians, and UNLV’s Graduate College, covering topics from adversarial AI safety to end-of-life care applications, with audiences ranging from 250 to 50,000.