Applied AI Summit

Free online conference | October 14-16, 2025

Automating the Oncology Patient Registry with Multimodal AI, Agentic Workflows, and Human Oversight [Keynote]

Oncology teams need high-quality patient registries to power real-world evidence, cohort selection, trial design (including external/control arms), outcomes research, and operations. These registries are meant to capture structured data such as pathological and clinical TNM staging, treatment histories, and outcomes, yet in practice such values are rarely stated explicitly and must be inferred across thousands of pages of clinical notes, pathology, radiology, genomics, and other documents. At the same time SEER and AJCC guidelines, which serve as reference documents for oncology-specific data extraction, run over a thousand pages with branching visuals and cancer-specific taxonomies exceeding a thousand fields, making the task one that demands both multimodal extraction and longitudinal patient-level reasoning.

This keynote presents a pragmatic, production-tested approach from John Snow Labs. We’ll demo a registrar-friendly UI that renders a longitudinal, patient-level view with the oncology taxonomy pre-filled and clickable provenance. Then we’ll unpack the architecture in four parts:

  • Data curation & information extraction: why not to “dump the chart into an LLM”; instead, a deterministic pipeline that combines SLMs with domain NLP and vision models, layered summarization, and field-level accuracy metrics.
  • Patient-level reasoning: timeline assembly, encounter grouping, token budget optimization, and narrow, role-specific downstream agents to reduce context and raise accuracy.
  • Oncology-specific agents: selecting guideline logic by cancer type; translating SEER/AJCC flowcharts into executable decision graphs for staging, histology, biomarkers, and treatments.
  • UI for validation & feedback: workflows that registrars, auditors, and clinical coders can actually use along with features for team collaboration, audit trails, and a learning loop that improves models over time.

Attendees will leave with a blueprint to automate registry construction that is accurate, explainable, scalable, and ready for real clinical operations.

About the speaker

Veysel Kocaman

CTO at John Snow Labs

Veysel is a Lead Data Scientist at John Snow Labs, improving the Spark NLP for Healthcare library and delivering hands-on projects in Healthcare and Life Science. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence and big data with over ten years of experience. He’s also pursuing his PhD in ML at Leiden University, Netherlands and delivering graduate level lectures in ML and Distributed Data Processing . Veysel has broad consulting experience on Statistics, Data Science, Software Architecture, DevOps, Machine Learning and AI to several start-ups, bootcamps and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than 20 talks at International as well as national conferences and meetups.