Applied AI Summit Healthcare
Free online conference | April 14-15, 2026From data archives to discovery engines: AI-powered semantic integration of massive public research data
The biomedical research community has generated over 9 million publicly available datasets and samples spanning genomics, transcriptomics, proteomics, and clinical data—representing an unprecedented opportunity for scientific discovery. Yet these resources remain severely underutilized due to fundamental barriers: fragmented metadata across repositories, inconsistent terminologies, and the inability to semantically connect datasets based on biological meaning rather than exact keyword matches.
This talk will demonstrate how AI/ML approaches are transforming our ability to unlock these data treasures. I’ll present novel frameworks that combine large language models, graph neural networks, and retrieval-augmented generation to: (1) automatically infer standardized annotations of datasets from their unstructured, plain-text metadata by extracting biomedical context from publications and ontologies and (2) create cross-modal spaces where textual descriptions and molecular profiles become translatable to generate novel hypotheses.
Through concrete examples spanning rare disease research, drug repurposing, and high-content imaging, I’ll also show how these AI-powered approaches enable researchers to pose entirely novel biological questions to millions of existing samples—moving beyond pre-calculated results to generate custom analyses on-demand. The implications extend beyond biomedicine: as massive data collections proliferate across scientific domains, AI-driven semantic integration represents a paradigm shift from treating public data as static archives to dynamic discovery engines that accelerate scientific breakthroughs while maximizing return on prior research investments.
About the speaker
Arjun Krishnan
Associate Professor at University of Colorado Anschutz
Arjun is an Associate Professor in the Department of Biomedical Informatics at the University of Colorado Anschutz. His group develops machine learning (ML)- and AI-based methods and tools that take advantage of massive public data collections to gain insights into complex disease mechanisms. In addition to data, algorithms, and computing, Arjun enjoys discussing the academia & the scientific enterprise, research education & training, open science, and science communication.