Applied AI Summit
Free online conference | October 14-16, 2025Where LLMs Fail: Engineering Regulatory-Grade, Multimodal Clinical Data De-identification [Keynote]
Frontier LLMs are powerful—but when the task is de-identifying clinical data to a regulatory standard, “pretty good” isn’t good enough. This session explains why sending text, images, or files to a general-purpose LLM remains unsafe, expensive, and brittle, and how John Snow Labs’ domain-specific pipelines achieve regulatory-grade results across modalities:
- Unstructured Text. In a 2025 peer-reviewed benchmark of commercial APIs on expert-annotated clinical notes, John Snow Labs outperformed Azure, AWS, and OpenAI, while being over 80% cheaper thanks to local deployment. An 93–94% accuracy level here isn’t enough, because any level below the human expert accuracy requires full manual review. Real-world deployments show automated pipelines can replace manual review entirely, de-identifying hundreds of millions of notes under certified Expert Determination.
- DICOM (radiology imaging) & WSI (pathology imaging). De-identification must scrub PHI from both pixels and metadata across diverse DICOM modalities and vendors, often at multi-GB file sizes. We’ll walk through data pipelines and cloud-native reference architectures that remove PHI from metadata and images inside your perimeter, with benchmarks on throughput and cost.
- Determinism & consistency. Unlike LLM prompts, production de-identification requires deterministic behavior, consistent obfuscation (the same fake identifiers everywhere), and tokenization that preserves longitudinal linkage across notes, PDFs, and imaging. We’ll detail how these pipelines support research while keeping re-identification risk “very small” under HIPAA Expert Determination.
Attendees will leave with concrete benchmarks, architectures, and practical checklists to move from “LLM redaction” to regulatory-grade, multimodal de-identification.
About the speaker
Dia Trambitas
Head of Product at John Snow Labs
Dia Trambitas is the Head of Product at John Snow Labs. With a deep expertise in Natural Language Processing and applied Generative AI, Dia has led the development of the Generative AI Lab — a no‑code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting‑edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user‑centered approach to building AI tools that are both powerful and accessible.