0:00
/
0:00
Transcript

Ignite AI: Minha Hwang on Scaling AI Experiments and Building Smarter Models with Less Data | Ep167

Episode 167 of the Ignite Podcast

The AI world moves fast—but few people think rigorously about how we know what’s actually working. In our latest episode of the Ignite Podcast, we spoke with Minha Hwang, Principal Applied Scientist at Microsoft, to break down the messy, high-stakes world of AI experimentation, model evaluation, and what comes after training your models.

With a unique journey spanning MIT, McKinsey, academia, and Microsoft, Minha combines deep technical expertise with a pragmatic business lens. This blog unpacks the most valuable lessons from our conversation.

🎯 From Data Storage to Decision Science

Minha’s path is anything but linear:

  • PhD #1 in materials science at MIT

  • McKinsey consultant working across industries

  • PhD #2 in marketing science—before becoming a professor at McGill

  • And today, he leads high-impact experimentation systems at Microsoft

What ties it all together? A relentless focus on data-driven decision making and understanding the real impact behind the numbers.

🧪 Why A/B Testing Isn’t Enough Anymore

Most companies lean heavily on A/B testing. But Minha warns of a harsh reality:

“False positives are shockingly common—especially when teams run too many tests, with too many metrics, on too little traffic.”

He outlines how Microsoft tackles this:

  • Proxy metrics to detect signal faster

  • Variance reduction techniques using ML

  • Repeat experiments to validate surprising results (“solidification”)

These practices help Microsoft scale experimentation without sacrificing trust in the data.

🔍 Causal Inference: The Most Underrated Skill in ML

While machine learning is great for prediction, Minha argues that causal inference—understanding what actually caused an outcome—is what truly drives business impact.

“Most ML teams are mapping X to Y. But businesses want to know: if I change X, what happens to Y?”

He highlights tools like observational causal inference, counterfactual reasoning, and A/B tests—but notes most data science programs underemphasize them.

🤖 Evaluating LLMs: The New Frontier

As Microsoft integrates large language models (LLMs) into more products, experimentation gets trickier:

  • A/B testing LLM features often lacks clean control groups

  • Standard metrics don’t always reflect user preference or quality

  • Evaluation becomes more about human preferences and offline metrics

This shift demands a new mindset—one that blends rigorous experimentation with deep qualitative insight.

🧠 The Case for Open Source and Reinforcement Learning

Minha is optimistic about:

  • Open-weight models like DeepSeek as democratizers of AI innovation

  • Reinforcement learning as a path beyond the limits of human-labeled data

“If we want models to go beyond human-level intelligence, we’ll need them to learn from experience—not just our data.”

He predicts RL and simulated environments will play a growing role in training next-gen AI.

🚀 What Comes After LLMs?

While LLMs dominate headlines, Minha is thinking ahead:

  • Smarter pricing agents for small businesses

  • Non-LLM applications with direct business value

  • Eventually, robotics and physical AI, where visual and tactile learning replaces pure text-based intelligence

The future, he believes, will demand more than language—it will require systems that understand, act, and adapt.

💡 Final Thought

Amid the AGI debates and benchmark hype, Minha offers a grounded view:

“As an engineer, I don’t care if it’s AGI. What matters is—does it solve the problem? Is it useful?”

That’s a philosophy worth holding onto in today’s rapidly evolving AI landscape.

🎧 Want to Go Deeper?

Listen to the full episode with Minha Hwang for stories, frameworks, and strategies you won’t hear anywhere else. Whether you're building AI systems or evaluating their business impact, this one’s a masterclass.

👂🎧 Watch, listen, and follow on your favorite platform: https://tr.ee/S2ayrbx_fL

🙏 Join the conversation on your favorite social network: https://linktr.ee/theignitepodcast

Chapters:

  • 00:00 Intro

  • 00:40 Minha’s Engineering Roots and PhD at MIT

  • 01:55 Jumping from Engineering to Consulting at McKinsey

  • 03:15 Why He Went Back for a Second PhD

  • 04:35 Transition from Academia to Applied Data Science

  • 06:00 Building McKinsey’s Data Science Arm

  • 07:30 Moving to Microsoft to Explore Unstructured Data

  • 08:40 Making A/B Testing More Sensitive with ML

  • 10:00 Why False Positives Are a Massive Problem

  • 11:05 How to Validate Experiments Through “Solidification”

  • 12:10 The Importance of Proxy and Debugging Metrics

  • 13:35 Model Compression and Quantization Explained

  • 15:00 Balancing Statistical Rigor with Product Speed

  • 16:30 Why Data, Not Model Training, Is the Bottleneck

  • 18:00 Causal Inference vs. Machine Learning

  • 20:00 Measuring What You Can’t Observe

  • 21:15 The Missing Role of Causality in AI Education

  • 22:15 Reinforcement Learning and the Data Scarcity Problem

  • 23:40 The Rise of Open-Weight Models Like DeepSeek

  • 25:00 Can Open Source Overtake Closed Labs?

  • 26:15 IP Grey Areas in Foundation Model Training

  • 27:35 Multimodal Models and the Future of Robotics

  • 29:20 Simulated Environments and Physical AI

  • 30:25 AGI, Overfitting, and the Benchmark Illusion

  • 32:00 Practical Usefulness > Philosophical Debates

  • 33:25 Most Underrated Metrics in A/B Testing

  • 34:35 Favorite AI Papers and Experimentation Tools

  • 36:30 Measuring Preferences with Discrete Choice Models

  • 36:55 Outro