Applied AI Summit

Free online conference | October 14-16, 2025

Transforming Clinical Notes into Research-Ready Data [Keynote]

The Ohio State University is building a unified, research-ready data asset that combines structured EHR tables with the rich signal locked in unstructured clinical notes—de-identified, continuously refreshed, longitudinally aligned, and ready for population health, academic collaboration, and investigator-initiated studies. This keynote presents a practical case study of how OSU is operationalizing that vision at scale, including an ongoing effort to de-identify >200M Epic notes via a HIPAA-approved pipeline. Built with John Snow Labs software and models, the end-to-end workflow includes: (1) cohort selection to target subpopulations, (2) de-identification with auditability and centralized configuration, (3) information extraction and coding to enrich notes with clinically meaningful labels, and (4) human-in-the-loop validation using the Generative AI Lab. Attendees will see a runnable, well-documented Python notebook that executes unchanged on Azure Databricks and on OSU’s on-prem environment (OSC)—demonstrating portable configuration, parallelization for high throughput, and consistent logging for monitoring and compliance. We’ll show how the same components underpin an augmentation pipeline for labeling with existing models, and how outputs are surfaced to researchers through OSU’s LifeScale environment. Takeaways include architecture patterns for cross-platform NLP at enterprise scale, lessons learned in protecting PHI while preserving utility, and concrete guidance for turning free-text notes into governed, longitudinal datasets that teams can trust and reuse.

About the speaker

Timothy Huerta

CRIO & Associate Dean for Research Information Technology at Ohio State University – Columbus Ohio

Dr. Huerta is the Chief Research Information Officer (CRIO) and Associate Dean for Research Information Technology (ADR) for the Wexner Medical Center and the College of Medicine at The Ohio State University. Additionally, he is the Director of Biomedical Informatics for the OSU’s NCATS-funded Center for Clinical and Translational Science (CCTS). In these roles, Dr. Huerta is responsible for the strategic advancement OSU’s collective biomedical mission through the enabling use of technology. In 2019, he established the Department of Research Information Technology (RIT) within the Office of Research for the College of Medicine to provide informatics expertise, research data extraction services from our data warehouse, research integration in our Epic environment, data collection and management via REDCap for the University, IT support of Research-focused CLIA-certified labs, endpoint support for advanced instrumentation, data science as service, a software development team that includes over 16 developers, and research co-design team that instantiates user-centered design as a service. Through the collaborative resources invested to advance a state-of-the-art informatics infrastructure, OSU has: expanded research in Clinical Decision Support by creating pathways for accelerated integration of CDS software into our Epic environment; established the collaborative structure that has enabled the CCTS, the Wexner Medical Center and its seven associated hospitals, and the Health Sciences Colleges of The Ohio State University to deliver on informatics projects; and, deployed a state-of-the-art data and analytics environment to support artificial intelligence/machine learning research based on Azure data services. Through his leadership, and joint investment by the OSU-Nationwide Children’s CTSA, the Center for Clinical and Translational Science (CCTS), the Information Technology division of the Wexner Medical Center and the University through the Dean’s office, he continues to focus on developing Research Informatics infrastructure to provide data to other scholars for use in their work, support open and transparent interdisciplinary science, and test technology-mediated models for improving the holistic delivery of care. As a Professor with a joint appointment in the Departments of Family and Community Medicine and Biomedical Informatics, Dr. Huerta’s research is focused on the large-scale research infrastructure necessary to support discovery in some of the most complicated questions facing medicine today. With a diverse grant portfolio that has included funding from AHRQ, NCATS, NIA, NIDA, NSF, PCORI, and The State of Ohio, Dr. Huerta has played a central role in ensuring the success of large-scale discovery in over $150M of funded research. Amongst his many projects, his research includes: engagement in data collection with community-based organizations throughout the state of Ohio focused on eliminating racial disparities experienced in infant mortality; focusing on the implementation science questions associated with the large-scale deployment of community-based participatory research in the opioid crisis that continues to destroy families; facilitating common data collection in clinical care through standardized workflows that have been deployed to over 15 hospitals; identifying the technical components that help us understand how technology is used by healthcare providers to engage patients through patient portals; and, helping the Patient Centered Outcome Research Institute (PCORI) understand its portfolio of funded projects, and more recently, evaluate the use of Artificial Intelligence and Large Language Models to accelerate the speed with which science is classified. Dr. Huerta’s work has required novel technological designs to allow researchers to understand how we move from knowing what works to doing what works. For example, by taking a design focus, he has developed a state-wide project that collects data from Community-Based Organizations to allow them to achieve data reporting parity with hospitals that have far more advanced capacities for data collection. His ability to function in this manner is grounded in his deep understanding of the intersection between health communication and technology. As a scholar who understands the technical aspects of how we collect data, he is a nationally recognized expert on issues related to health information technology (HIT), with extensive expertise in outcomes as it relates to Electronic Health Records (EHRs) and patient-centered HIT. He has developed software that integrates with Epic that enables patients and clinicians to engage in shared decision-making related to statins. He authored software that allows patients to identify service delivery issues and positive experiences throughout the medical center. Both of these systems are in active use and speak to his unique talents to empower the patient with a stronger voice as part of their care through innovation. During his career, he has had primary appointments in Colleges of Medicine (UBC and Ohio State), Business (Texas Tech), and Public Policy (USC), as well as appointments with the National Center for Supercomputing Applications (University of Illinois-Chicago). His post-doctoral work with the National Cancer Institute was focused on collaboration and team science, resulting in some of the earliest quantitative work on the subject and the landmark NCI Monograph “Greater than the Sum: Systems Thinking in Tobacco Control” which served to outline how systems thinking, network analysis and knowledge management principles could transform discovery. His analytic approach is highly quantitative, having drawn heavily from econometrics, social network theory, and the mathematical side of complexity science. The result of his diverse approach has resulted in a concomitant diverse publication record, varying from Health Services Research, Implementation Science, Informatics, Medicine, and Social Science. His broad interdisciplinary background, his experience in user-centered design, his experience in data and research governance, his deep knowledge of informatics, and his knowledge of high-performance computing allow him to serve as a boundary spanner that is rare in the academy.