Stanford CS25: V4 I Aligning Open Language Models
10 May 2024 (9 months ago)
Key Historical Milestones in Language Modeling
The Rise of Language Models
- 2020: GPT-3 release marked a significant improvement in language models.
- 2021: "Stochastic Parrots" paper raised questions about language model capabilities.
- 2022: ChatGPT's release reshaped the narrative around language models.
Reinforcement Learning from Human Feedback (RLHF)
- RLHF is crucial for advanced language models like ChatGPT.
- RLHF is cost-effective and time-effective, surprising NLP researchers.
- Examples of RLHF's impact, such as Anthropic's models.
Alignment and Open Alignment
- Timeline of alignment and open alignment, showcasing RLHF benefits.
- Various alignment concepts: instruction fine-tuning, supervised fine-tuning, RLHF, preference fine-tuning.
Recent Developments in Fine-Tuning and Alignment
- ChatGPT sparked discussions on open-sourcing models and forming development coalitions.
- The Llama Suite's Alpaca model and its instruction-tuned capabilities.
- Subsequent models like Vicuna introduced new prompt sources and the concept of an LLM as a judge.
- Diverse datasets like SharGPT accelerated progress in fine-tuning models.
- Legal considerations due to unlicensed datasets highlighted the need for responsible data collection.
- Recent datasets like LMIS Chat One 1M and WildChat address data quality and user consent issues.
- Weight differences between models due to licensing restrictions.
- Notable models: Dolly (human data integration), Open Assistant (human-generated prompts), Stable Vuno (early RHF proficiency).
- Efficient fine-tuning methods: QOR (low-rank adaptation), Cura (quantization and GPU tricks).
- New evaluation tools: Chatbot Arena, Alpaca of Val, Mt Bench, Open LLM Leaderboard.
- Challenges in interpretability and specificity of evaluation metrics.
Reinforcement Learning Fundamentals
- Review of reinforcement learning (RL) fundamentals, reward functions, and optimization.
- Introduction to direct preference optimization (DPO) as a simple and scalable training method.
- DPO involves using gradient ascent to directly optimize the loss function without learning a reward model.
- Successful scaling of DPO to a 70 billion parameter model, achieving performance close to GPT-3.5 on Chatbot Arena.
- Contributions from other projects like Nvidia's SteerM and Berkeley's Starling LM Alpha.
- PO currently outperforms DPO in alignment methods.
The Modern Ecosystem of Open Models
- Growth of the open models ecosystem with diverse models and companies.
- Emerging models like Gen-struck for rephrasing text and instruction models.
- Open models catching up to closed models, but demand for both types persists.
- Data limitations in alignment research, with a few datasets driving most work.
- Need for more diverse and robust datasets to improve model performance.
Ongoing Research and Future Directions
- Continued research on DPO with various extensions and improvements.
- Increasing prevalence of larger model sizes and alignment research at scale.
- Growing popularity of smaller language models for accessibility and local running.
- Personalized language models for enhanced user experience and capabilities.
- Active contributions to alignment research by organizations and individuals.
- Model merging as an emerging technique for easy model merging without a GPU.
- Alignment's impact beyond safety, improving user experience and capabilities in areas like code and math.
- Synthetic data limitations and the need for controlled and trusted domain-specific models.
- Ongoing search for a better evaluation method with a stronger or more robust signal.
- Importance of embracing new developments and rapid progress in the language model space.
- Alignment involves changing the distribution of the language model's output and can involve multiple tokens and different loss functions.
- Watermarking for language models seen as a losing battle, with a focus on proving human-made content rather than AI-generated content.
- Exploring different optimization functions beyond maximum likelihood estimation (MLE), such as reinforcement learning from human feedback (RHF).
- Layered approach required to defend against attacks like Crescendo, considering specific use cases and limiting model capabilities.
- Potential of quantization methods like Bitet and BitNet, but expertise needed for further exploration.
- Need to control large-scale data extraction from large language models, with synthetic data generation as a potential solution.
- Self-play as a broad field with no consensus on effective implementation.