Kaustubh Dholé

I’m a PhD researcher at Emory University’s Department of Computer Science working with Prof. Eugene Agichtein. I deal with a wide variety of problems which generally fall under Natural Language Processing & Information Retrieval. My long-term research goal is to build autonomous systems that can reason with evidence and operate as trustworthy collaborators in high-stakes decision-making. I focus on methods that let artificial intelligence (AI) systems (1) retrieve and organize the external knowledge they rely on intelligently and autonomously, (2) provide evaluations that are interpretable, holistic and optimizable (3) act as accountable and goal-aligned decision partners for language, clinical support, scientific discovery, and policy rhetoric.

I completed my bachelor’s at BITS Pilani, India, spent a year and a half at Tata Institute of Fundamental Research (TIFR, Mumbai), after which I worked for 6 years building the AI Agent Amelia at IPsoft Amelia.ai (now SoundHound) in the wonderful cities of Bangalore & New York, collaborating with Prof. Chris Manning. I was fortunate to work with the leadership of Amelia, including Uday Chinta and Chetan Dube and many fellow managers of other teams. I led the R&D team of around 15-20 R&D engineers and scientists working on diverse NLP topics across dialogue modeling (NLU, and NLG), VerbNet & PropBank parsing, KB-based QA, data augmentation, semantic parsing, relation extraction & dialog retrieval, ranking, generation, other real-life conversational AI problems (handling agent fallback, agent ambiguity, etc) & offline AI like building interfaces for efficient and smart data annotation, and active learning, etc. Much of the work also involved managing a team of back-end, front-end, and UX developers for end-to-end shipping of different modules of the Amelia stack. To know more about Amelia, check the Science Behind Amelia (page 8).

In the summers of the past 4 years (2022 to 2025), I collaborated with the Natural Understanding Team, (now Alexa AGI) at Amazon, Alexa in New York, and San Jose on multi-task learning for their LLMs and creating simulators for training LLMs, in the Search Experience Science team at Seattle ⛰️, and the Stores Foundational AI team on Pretraining and Midtraining Dataset Valuation for Reasoning Tasks.

Some of my Recent Publications:

(SIGIR 2026) RubricRAG: Towards Interpretable and Reliable LLM Evaluation via Domain Knowledge Retrieval for Rubric Generation [Link]
(SIGIR 2026) Designing Diverse RAG Benchmarks: A Hierarchical Framework for Synthetic Question Generation [Link]
(NAACL 2025) ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM Judges [Link]
(SEM 2025) AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation [Link]
(SCI-CHAT EACL 2024) Kaucus: Knowledge Augmented User Simulators for Training Language Model Assistants [Link]
(ECIR 2024) Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback [Link]
(NAACL 2024) DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation [Link]
(Logical & Symbolic Reasoning in LMs @ AAAI 2026) Stabilizing Reinforcement Learning for Honesty Alignment in Language Models on Deductive Reasoning [Link]

Most upto date stuff can be found on Semantic Scholar and Google Scholar.

Experience

Applied Scientist (Summers)

~1 year (22, 23, 24, 25)

📍Amazon AGI, Alexa Search Experience Science, Stores Foundational AI

AI R&D Lead (Science)

~3.5 years 2017–2021

📍IPsoft Amelia.ai (now SoundHound), Bangalore & New York

AI R&D Engineer (Science)

~3 years 2015–2017

📍IPsoft Amelia.ai (now SoundHound), Bangalore & New York

Researcher

~1.5 years

📍TIFR, Mumbai

Education

AI PhD Researcher

RAG Evaluation (Present)

📍Emory University

Masters (CS Track)

RAG, Evaluation

📍Emory University

Engineering (Hons)

Electrical Engineering

📍BITS Pilani

Areas of Interest:: Reasoning and Reinforcement Learning,NLG Evaluation, Retrieval, Retrieval Augmented Generation, RAG Evaluation
Other Areas I'm happy to collaborate or have coffee chat ons:: User Simulators, Dialog Systems, Graph Neural Networks, Data Augmentation, Efficient Transformers, Privacy Preserving ML, Bigger Picture of LLMs

Workshops:: - Co-organizer of the Generation, Evaluation & Metrics Workshops GEM 2021, GEM 2022, GEM 2023, 2025.; - Co-organizer of the wisdom-of-researchers collaboration to create the largest data augmentation repositoryNL-Augmenter and a key contributor of LLM task benchmark BIG-Bench.

Recent Mentoring/Speaking:: - Gave a talk From Traditional NLP Agents to Reasoning and Retrieval Agents at Stanford University of my work at Amelia.ai and Emory; - Mentored 5 graduate students on efficient variants of GNNs at the London Geometry & Machine Learning Summer School, 2022; - Presented some of the work on RAG evaluation at the Workshop on Task Focussed IR in the Era of Generative AI at Redmond, Microsoft Research; - Gave a talk on Retrieval Augmented Generation at the University of Edinburgh, 2024 while on my visit to present LLM based reformulation at ECIR 2024, Scotland; - Intelligence Advanced Research Projects Activity, USA (IARPA) funded project BETTER : Presented IR work at IARPA Demo Day, Maryland. Check related publications.; - Invited as Speaker & Guest of Honour at VIT's ICAITR 2021, Mumbai. Gave a short talk on "NLP in the Past Decade"; - Bioinformatics article was featured on Global Medical Discovery [ISSN 1929-8536] as a Key Scientific Article contributing to excellence in biomedical research.

Work in Media: Why India needs to counter AI bias and stereotypes Lokmat Times, 2025 (Full Paper Pg. 10); AI Is Spreading Old Stereotypes to New Languages and Cultures Wired Magazine, 2025; This data set helps researchers spot harmful stereotypes in LLMs MIT Technology Review, 2025; 444 Authors From 132 Institutions Release BIG-bench: A 204-Task ‘Extremely Difficult and Diverse’ Benchmark for Large Language Models Synced Technology Review, 2022; 55 Researchers From 44 Institutions Propose GEM, a ‘Living Benchmark’ for NLG Synced Technology Review, 2021

Recent Lectures on Retrieval Augmented Generation (April 2026):

Recent Lectures on Retrieval Augmented Generation (May 2024):

Other Projects:

If you want to get in touch or are interested in collaborating, feel free to reach me at kdhole AT emory DOT edu (or LinkedIN or Twitter where I’m sometimes active.)

Long ago, I used to maintain a personal blog on WordPress where I mostly wrote non-NLP stuff on rare occasions! You can find some of my random writings on Politics, Linguistics, some book reviews and sometimes when I’ve gone backpacking! One serious advice - cook this! And if you want motivation to pursue a career in linguistics, NLP, or AI in general, virtual visit the language museum in DC!

Test out your AI skills at LLM Quiz Time and Quiz Badminton, or check what are the commmon words between languages (United Lexicons).

Mentoring and Managing at Amelia R&D (2015 to 2021):: R&D/Senior R&D Engineers & Scientists: Krishna Mohan Barakam, Ashish Srivastava, Aadesh Gupta, Abhinav Bhatt, Arpan Kulshreshtha, Priyank Soni, Venkatesh Magham, Anurag Kashyap, Kaustav Dutta, Ramavtar Malav, Vishwa Teja, Manjunath Hegde, Roopesh Mangal, Mohit Rohatgi, Rohit Kalra; Interns: Bhargav Sagiraju, Chandra Reddy, Pranav Kamojjhala