Effective Performance Engineering at Twitter-Scale

25 Jun 2024 (6 months ago)
Effective Performance Engineering at Twitter-Scale

Performance Engineering Challenges

  • Traditional performance engineering approaches are no longer sufficient to handle the complexity of modern software applications.
  • Performance engineering used to be easy due to consistent hardware advancements, but now it's harder because of specialized hardware.
  • Modern software applications are highly complex, making it challenging to identify and optimize performance issues.

Systems Thinking for Performance Engineering

  • Systems thinking provides a language and framework for modeling and understanding complex relationships within a system, enabling performance engineers to identify and address bottlenecks and inefficiencies.
  • Performance engineering should be viewed as a counting exercise on top of a system model, where resources are counted and analyzed to determine their utilization and impact on performance.
  • Having a system model in place allows performance engineers to count resources at the appropriate granularity to accurately assess and optimize performance.

Performance Engineering at Twitter

  • At Twitter's scale, performance engineering involves building a model of the system and collecting metrics at a fine granularity.
  • Custom samplers and systems are used to collect low-level telemetry with minimal overhead.
  • The data infrastructure built for performance engineering is also used to solve real problems, such as optimizing GC intervals and identifying underutilized resources.
  • The data generated is also used by other teams for capacity planning and service optimization.
  • Twitter's performance engineering team developed a data aggregation pipeline to analyze trace data and gain insights into the performance of their distributed systems.
  • They used tracing and profiling techniques to capture the interactions between different services and understand how they contribute to overall performance.
  • They built a service dependency explorer to visualize the connectivity and load amplification between services, allowing them to identify bottlenecks and optimize resource allocation.
  • The team also developed a model called "Laten here" to perform causal reasoning and determine the root causes of latency issues.
  • The data engineering efforts enabled the team to answer complex performance-related questions that were not possible with other data sets.
  • The team's work extended beyond performance engineering to include data privacy analysis, where they leveraged the system dependency data to identify sensitive information and access patterns.

Building a Successful Performance Engineering Team

  • Performance engineering doesn't fit neatly into traditional organizational structures due to its cross-functional nature.
  • To be effective, performance engineers need to align their work with either top-line or bottom-line objectives.
  • Building a strong network of advocates and supporters within the organization is crucial for performance engineers to gain influence and drive change.
  • The symbolic significance of having a dedicated performance engineering team emphasizes the importance of performance optimization within the organization.
  • Performance engineers should encourage and support other engineers to contribute to performance work, rather than trying to centralize all performance-related tasks.
  • To build a successful performance engineering team, you need a diverse team with different skill sets.
  • Start with a small team of people who are much better than you at some things.
  • Do a lot of odd jobs and favors to justify your existence and build trust.
  • Make mistakes, learn from them, and move on.
  • Write down your vision and methodology once you have a bit of trust and some small wins.
  • As your team grows, intentionally brand yourselves as the performance people.
  • Invest in talks, publishing, and writing papers to share your knowledge.
  • Once you're mature, start creating platforms and products that allow others to do similar things.
  • Design the team to fit the organization structure, don't copy what others do.
  • Outreach and adoption are serious work, treat them as such.
  • People make work happen and their strengths and personalities differ, so it's important to respect that and seek diversity in skills and perspectives.
  • Embrace chance and don't have a predetermined mindset, as long as everyone is going in the right direction, the particular path is not that important.
  • Software engineering is a social enterprise, so to succeed in the long term, it's important to be helpful, generous, and make friends.

Tools and Techniques for Performance Engineering

  • Understanding the structure of a service and the distributed system is valuable for performance optimization.
  • eBPF is a powerful tool for gathering metrics and can be used to obtain metrics that are traditionally obtained in-process, often with less cost and better performance.
  • Traces are stored in a real-time data pipeline with different indices depending on the level of information needed.
  • Queries on traces may have a five-minute delay, but they can answer questions that no other data sources can.

Overwhelmed by Endless Content?