Risky Business Features·Apr 29, 2026·1h 12m

A deep dive on AI model distillation attacks

Download audio Show notes Apple Podcasts Spotify

Show notes

In this solo episode of Risky Business Features James Wilson explores how distillation techniques are both a legitimate way to train smaller models, as well as a way to steal model capabilities. It’s not just a problem for frontier labs! Any LLM-based product could have its competitive advantage stolen through these attacks.

James covers:

High-level concept of distillation
Why it matters including close/open-weight/open-source explanation
Types of distillation and the prompts used
The distillation pipeline end to end
Distillation at scale and mitigation techniques
Hardware resource constraints for distillation

Show notes

Self-Instruct: Aligning Language Models with Self-Generated Instructions
Alpaca: A Strong, Replicable Instruction-Following Model
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Zephyr: Direct Distillation of LM Alignment
Stealing Part of a Production Language Model
Microsoft probes if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports
Detecting and preventing distillation attacks

← Previous

Feature Interview: Nicholas Carlini, Anthropic

Solving the AI agent identity problem