A deep dive on AI model distillation attacks
Show notes
In this solo episode of Risky Business Features James Wilson explores how distillation techniques are both a legitimate way to train smaller models, as well as a way to steal model capabilities. It’s not just a problem for frontier labs! Any LLM-based product could have its competitive advantage stolen through these attacks.
James covers:
- High-level concept of distillation
- Why it matters including close/open-weight/open-source explanation
- Types of distillation and the prompts used
- The distillation pipeline end to end
- Distillation at scale and mitigation techniques
- Hardware resource constraints for distillation
Show notes
- Self-Instruct: Aligning Language Models with Self-Generated Instructions
- Alpaca: A Strong, Replicable Instruction-Following Model
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- Zephyr: Direct Distillation of LM Alignment
- Stealing Part of a Production Language Model
- Microsoft probes if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports
- Detecting and preventing distillation attacks