An inside look at how GitHub uses LLMs, fine-tuning, and prompt engineering in GitHub Copilot
31 Oct 2024 (15 days ago)
Introduction
- GitHub Copilot is an AI assistant for code completions in software engineering that uses state-of-the-art machine learning techniques to increase completion quality and supercharge developer productivity (42s).
- The traditional editor experience is enhanced with Copilot, which provides code suggestions to developers in the form of "ghost text" that can be accepted or rejected (1m9s).
- To generate suggestions, Copilot builds an understanding of what the developer is trying to do by assembling various pieces of code and information from within the editor, including the code in the active file, neighboring tabs, and metadata such as file paths and programming language (2m5s).
- The assembled context is provided to a language model, which generates a completion suggestion that is then shown to the developer (2m25s).
Context and Latency Optimization
- To prioritize the most relevant context, Copilot prioritizes code in the currently active file and breaks neighboring tabs into code snippets, ranking them based on similarity to what the developer is currently working on (3m31s).
- To minimize latency, Copilot deploys language models to data centers around the world and dynamically routes requests to the nearest data center (4m18s).
- The goal of Copilot is to meet the developer at the point of acceleration, providing completions quickly and efficiently to supercharge productivity (4m11s).
- GitHub Copilot optimizes request routing by directing them to the nearest data center to minimize communication overhead, and it manages language model verbosity to reduce latency, especially when developers are completing the current line of code. (4m26s)
- The system preemptively instructs the language model to stop generating code after completing the current line, which helps in quickly returning the completion to the developer. (5m8s)
Post-processing and Model Management
- Postprocessing of code completions includes aligning indentation, cleaning up overlapping code, and running safety filters, as well as comparing suggestions to public code databases to ensure safety and relevance. (5m33s)
- The entire process of generating code completions takes less than 300 milliseconds on average, with millions of completions processed daily worldwide. (6m10s)
- GitHub Copilot has been powered by state-of-the-art large language models from OpenAI, and GitHub is testing new models from other providers to enhance the Copilot experience. (6m23s)
- GitHub continuously integrates new models and features, improving code quality, reducing latency, and increasing the context window from 2,000 to 8,000 tokens without sacrificing performance. (7m0s)
- Copilot now supports including code after the cursor and updates models to keep the training data current. (7m48s)
Model Training and Customization
- OpenAI models, which power Copilot, are trained on vast amounts of public code, making them effective generalist models for various programming languages and situations. (8m3s)
- Fine-tuning allows enterprises to train custom models on their data and coding practices, which is beneficial for those using proprietary or niche programming languages. (8m39s)
- GitHub Copilot allows enterprises to create custom models using their private data, enabling developers to have a more tailored coding experience by understanding their code base at a deeper level. This ensures that all training, model hosting, and inferencing remain private and unique to the enterprise, with data never shared or used to train other models. (9m18s)
- The custom model program is currently in a public beta phase, where admins can select repositories and filter supported languages for training. GitHub plans to bring this feature to general availability soon. (10m38s)
Fine-tuning and Partnerships
- GitHub is working on advancing large language models for code completion by leveraging fine-tuning techniques to improve code quality for enterprise developers. This involves researching language-specific or code-specific fine-tuned models to enhance completion quality in critical programming languages. (11m18s)
- GitHub has partnered with Microsoft to address areas where generalist models underperform, such as with newer .NET Framework libraries. This partnership has led to fine-tuning models to improve support for these programming areas. (12m52s)
Enhancing Developer Intent and Context
- GitHub is exploring ways to enhance understanding of developer intent by increasing the context window and using language-specific contextualization, which helps in building a deeper understanding of what developers are doing. (13m44s)
- GitHub has developed custom language-specific prompt crafting and contextualization rules for GitHub Copilot, allowing for more accurate code completions, such as importing header files in C++ regardless of whether the file is open or closed (14m24s).
- The company is also working on building a project-wide approach to code completions, using learnings from Co-Pilot Chat, but faces challenges due to the latency sensitivity of code completions (15m2s).
Next Edit Suggestions (NES)
- Co-Pilot code completions can provide suggestions in the form of code insertions at the current cursor location, but the company is exploring a new feature called Next Edit Suggestions (NES) to help with coding changes beyond insertions, such as deletions and modifications (15m43s).
- NES predicts the changes that follow a user's edit and presents them as a sequence of edits, regardless of their location in the file, and can suggest insertions, modifications, and deletions (16m29s).
- To facilitate the review process of proposed changes, GitHub provides a rich diff view of the incoming changes, making it easier for developers to understand and review the changes (17m39s).
- The Next Edit Suggestions feature was originally developed by the GitHub Next team, and the company is rapidly developing and refining the suggestions quality and user experience (17m53s).
Offline and Online Evaluation
- GitHub is constantly exploring new ways to improve Co-Pilot code completions and provide users with better completions, and is researching offline evaluation methods to estimate the impact of changes on production before introducing them to users (18m17s).
- Offline evaluation is a challenging task due to the open-ended nature of coding, but the company is committed to researching and improving this process (18m40s).
- Offline improvements in GitHub Copilot show promise, which can be validated through AB testing to assess the effectiveness of the changes (19m3s).
- A small but statistically significant portion of real traffic is exposed to the changes, along with an isolated control group, to evaluate the impact on users (19m11s).
- Online evaluations provide clear insight into how changes affect users, complementing offline evaluations that gauge and prioritize impactful changes (19m23s).
- The combination of offline and online evaluations offers experimental agility and confidence in making positive improvements to the GitHub Copilot experience (19m45s).
- This approach enables the continuous making of positive improvements to the GitHub Copilot experience (19m52s).