CI/CD beyond YAML: The Evolution Towards Pipelines-as-Code
28 Aug 2024 (4 months ago)
Challenges of YAML-Based CI/CD and Airbnb's Solution
- Bart created a back office service and implemented CI using YAML, which enabled him to build, test, and deploy quickly. (1m30s)
- As Bart developed more services (cart service, config service, inventory service), he ended up with multiple YAML files, making the CI setup more complex and difficult to manage. (1m45s)
- The company represented by the analogy of Bart, had 7,000 lines of YAML code, relied heavily on shell scripts, experienced developer dissatisfaction with the CI process, incurred high costs ($5,000 per month), and had a long build time (45 minutes). (4m38s)
- YAML-based CI/CD pipelines, while initially simple, can become difficult to manage due to a lack of features like "jump to definition" as they grow more complex. (6m42s)
- Extensive use of third-party GitHub actions can lead to a "black box" effect, making it challenging to troubleshoot issues when numerous actions with potentially thousands of lines of code are employed. (7m37s)
- YAML's limitations as a DSL can make expressing complex logic, such as conditional deployments based on file changes, difficult and hard to understand. (9m48s)
- Over three months, the amount of YAML used was reduced by 50% by first targeting the most challenging builds and creating a proof of concept framework. (34m47s)
Components and Importance of CI/CD Systems
- CI event triggers initiate CI jobs in response to events like pushing to the main branch, commenting on a pull request, or creating a pull request. (13m2s)
- Authentication is crucial for CI/CD systems to interact with remote systems, handle secret injection, and manage deployments, uploads, and logging. (14m32s)
- The orchestrator layer in CI/CD systems manages job execution, step breakdown, state management (passed, completed, failed, skipped), and workflow orchestration, often visualized as a directed acyclic graph (DAG). (15m4s)
- There are four main components of a CI/CD system: caching, the execution layer, the runner infrastructure, and reusable modules. (18m26s)
- The execution layer is the most important part of a CI/CD system because it is responsible for running the builds, tests, and deployments. (17m32s)
Designing User-Friendly CI/CD Systems
- When designing a CI/CD system, it is important to start with the bottom layer, which includes caching and the execution layer, because these components will inform the design decisions for the upper layers. (21m20s)
- Developers can wrap execution layers in a CLI because CLIs are developer-friendly, easy to create, and understandable by machines. (22m48s)
- Dagger, an SDK on top of BuildKit, allows for the definition of execution layers entirely as code and features an opt-out caching approach. (24m2s)
- At Airbnb, Prefect was chosen for orchestration due to its data-heavy background, but other options like general-purpose programming languages or tools like Airflow are available. (25m58s)
- End users need to be comfortable using CI/CD tools and being able to work with familiar languages is important for adoption. (28m10s)
- AirCMD, a Python-based CLI tool, can execute composable commands (build, test, CI) offering flexibility for users. (29m6s)
- The tool leverages caching for performance, and a visualization tool helps users understand cache hits and misses. (30m21s)
Benefits and Challenges of Evolving CI/CD Approaches
- One benefit of standardized CI/CD solutions is that they are quick and easy to grow with, but they can become difficult to manage when complexity increases, such as when Python code is inlined in YAML. (41m17s)
- Vendor-independent solutions are important in the volatile CI/CD tooling market because they allow for easier migration between platforms. (44m47s)
- There are challenges in managing and distributing reusable modules in microservices architectures, particularly in versioning tools, packages, and plugins, and deploying updates and testing. (45m42s)
- One of the priorities is tackling the challenge of maintaining the ecosystem for reusable modules, which currently puts a significant burden on platform engineers. (46m15s)
- Dagger, a relatively new technology built on Buildkit, presents challenges due to its novelty and the unfamiliarity of containerized builds for some organizations. (47m8s)
Achieved Improvements with New CI/CD Approach
- A 90% cost savings was achieved, partially due to the ability to leverage the chosen tooling and system to easily identify and address caching bottlenecks. (35m21s)
- By switching from a system that spawned one machine per connector to a single machine approach that leveraged caching, a 70x reduction in machine time was achieved when testing 70 connectors. (36m22s)
- A key methodology for success in adopting new approaches to CI/CD is to demonstrate a dramatic win, such as improving the slowest or flakiest job. (39m50s)