Interpretability Techniques for Large Language Models in Healthcare NLP: A Comparative Analysis of Token-Level, Model-Level, and Behavior-Level Methods.

Large Language Models (LLMs) are transforming healthcare NLP applications — including clinical summarization, diagnosis assistance, adverse event detection, and risk prediction — but their black-box nature remains a barrier to trust, safety, and regulatory adoption. This talk presents a comparative analysis of leading interpretability techniques for LLMs, spanning:

  • Token-level attribution
  • Model-internal reasoning
  • Behavior-level safety evaluation

We examine methods such as LIME, SHAP, Integrated Gradients, and Attention Rollout for explaining which clinical tokens influence model predictions, along with probing classifiers and neuron activation analysis to reveal how LLMs encode:

  • Medical terminology
  • Negation
  • Temporality
  • Clinical relations

Additionally, we evaluate counterfactual explanations, bias and fairness assessments, and hallucination detection to measure the real-world reliability of LLM behavior.

Through a unified comparison framework, we highlight the strengths, limitations, and clinical relevance of each technique and offer practical recommendations for deploying interpretable and trustworthy LLMs in healthcare settings. This session provides clinicians, researchers, and AI practitioners with actionable insights to enhance transparency, mitigate risk, and improve the safety of AI-driven clinical decision support systems.

About the speaker

Jahnavi Anilkumar Kachhia

Senior Software Engineer at Accompany Health

Jahnavi Kachhia is a Senior Software Engineer at Accompany Health, where she builds scalable, AI-driven systems focused on advancing healthcare technology and improving patient-centric solutions. Previously at Meta’s Reality Labs, she contributed to AR/VR innovation and LLM-based intelligent systems supporting large-scale user experiences.

An active leader in the AI research community, she serves on the Program Committees of IJCAI 2025/2026 and PAKDD 2026, and reviews for leading venues including AAAI, IJCNN, IEEE conferences, MIDL, ECIS, and HRI. Her peer-reviewed publications in IEEE Xplore and Hindawi span radar signal processing, deep learning, and applied AI systems. Passionate about AI in healthcare, her work bridges cutting-edge research with real-world impact through trustworthy, inclusive, and data-driven innovation.