Applied AI Summit
Free online conference | October 14-16, 2025Automating Enterprise Data Discovery: Modern Data Catalogs Powered by Generative AI & LLMs
Enterprise data landscapes have become increasingly complex, with organizations managing thousands of datasets across cloud platforms, data lakes, and legacy systems. Traditional data catalogs require extensive manual curation, leading to outdated metadata, poor discoverability, and significant operational overhead. Studies show that data professionals spend 60% of their time searching for and preparing data rather than generating insights.
This keynote explores how Generative AI and Large Language Models are revolutionizing enterprise data discovery through intelligent automation. We’ll demonstrate how modern AI can automatically understand database schemas, generate human-readable descriptions, classify data sensitivity, and maintain dynamic data lineage—all without human intervention.
Attendees will learn to architect next-generation data catalog systems that leverage GPT-4, Claude, and other LLMs for automated metadata generation, semantic understanding, and natural language interfaces. The session covers practical implementation strategies including prompt engineering for data interpretation, embedding techniques for semantic search, and integration patterns with existing data infrastructure.
We’ll showcase real-world architectures combining vector databases, graph technologies, and generative AI to create self-maintaining catalog systems. Key technical topics include handling multi-source data ingestion, implementing automated data quality assessment, building conversational query interfaces, and ensuring governance compliance through AI-driven classification.
The presentation includes live demonstrations of automated data profiling, AI-generated documentation, and natural language data discovery workflows that transform how enterprises interact with their data assets.
By session end, participants will understand how to implement fully automated data discovery systems that reduce manual catalog maintenance while dramatically improving data accessibility and user experience across their organizations.
About the speaker
Vipin Kataria
Senior Lead Architect (Data/ML) at Picarro Inc
Vipin Kataria is a Senior Lead Architect with over 21 years of experience designing enterprise-scale cloud platforms and ML/AI systems. Currently based in Fremont, California, he leads advanced data architecture initiatives at Picarro, where he specializes in environmental monitoring, hazardous gas detection, and real-time IoT sensor data processing. Vipin holds a Master of Science in Machine Learning/Data Science from the University of Illinois at Urbana-Champaign (UIUC) and a Bachelor of Engineering in Information Technology from the Indian Institute of Information Technology (IIIT), India. His strong academic foundation in both information technology and advanced machine learning provides the theoretical underpinning for his practical expertise in enterprise-scale data systems. Throughout his career, Vipin has demonstrated expertise in telecommunications and hardware integration, with deep domain knowledge in 3GPP protocols and Intel XMM modem platforms. He has held leadership positions at prominent technology companies including Intel Corporation, where he spent eight years managing cross-functional teams and developing feature engineering services for Intel XMM modem platforms, and Amazon Lab126, where he designed cloud-based telemetry analysis tools for 3GPP modem platforms. As a cloud platform architect, Vipin specializes in microservices architectures, event-driven systems, and streaming data pipelines that process terabytes of telemetry data from distributed IoT ecosystems. His work includes integrated MLOps frameworks for real-time predictive analytics, automated anomaly detection, and intelligent feature engineering supporting Fortune 500 enterprise deployments across multiple geographic regions. Vipin’s recent focus includes data science and generative AI, where he architects end-to-end MLOps pipelines, transformer-based NLP models, conversational AI frameworks using large language models, and Retrieval Augmented Generation systems. His proven track record includes implementing predictive analytics and anomaly detection for mission-critical applications. In addition to his industry leadership, Vipin maintains an active presence in the academic and research community as a technical paper reviewer for conferences, journals, and technical awards functions. He serves as a technical papers reviewer for several prestigious conferences and journals, including CCNCPS 2025 (Dubai, UAE), AI-SI 2025 and 2026 ISIBER, and the Journal of Computational Analysis and Applications. He has participated in notable international conferences such as ICBATS-3 2025 and EIRTM 2025, contributing to advancing knowledge in his field. His research excellence has been recognized through the Globee Awards for Artificial Intelligence 2025, and he maintains an active research profile with publications in peer-reviewed journals (ORCID: https://orcid.org/0009-0006-5332-7965). As an active IEEE member, he continues to engage with the broader engineering and technology community and has also served as a judge in several hackathons, further demonstrating his commitment to fostering innovation and mentoring emerging talent. Beyond his technical expertise, Vipin is recognized as a technical leadership and talent development champion who identifies, recruits, and mentors top engineering talent while driving data-driven digital transformations and legacy platform modernization initiatives that generate substantial revenue and competitive advantages for organizations.