Trends in Engineering Leadership: Observability, Agile Backlash, and Building Autonomous Teams
15 Nov 2024 (2 months ago)
Introduction and Coric
- The upcoming Cuong conference in San Francisco will feature tracks on architectures, engineering productivity, and generative AI in production, where senior practicing software developers will share their experiences in adopting emerging trends (40s).
- Chris Cooney is the head of developer relations for Coric, a full-stack observability platform, and has spent 11 years as a software engineer, starting as a Java engineer and moving into front-end engineering, Sr, and DevOps before transitioning into engineering leadership (1m27s).
- Chris has moved into engineering leadership and has seen organizations grow and change, and now, as the head of developer relations for Coric, he gets to research and understand the trends in the market, including the adoption of observability (2m7s).
- Chris meets hundreds of people every month and collects their views and insights, which makes him excited to discuss various topics in the industry, including trends in engineering leadership (2m34s).
Agile Pushback and Observability
- One trend Chris has observed is the emotional pushback against Agile, not necessarily its core tenets, but the word itself, which has led to a certain amount of fatigue around it (3m38s).
- Another trend Chris is seeing is the move towards observability, which is a technical trend that he is closely involved with due to his work with Coric (3m40s).
- Observability in organizations is shifting from focusing on what's going wrong to broader questions about the company's performance, such as Dev hours, mean time to recovery, and reduction, blurring the lines between business, technical, and people measures (3m54s).
- Measurement in the industry is becoming more abstract, with examples including measuring rage clicks and the emotionality of user interactions, providing a microcosm of the changes in the industry (4m17s).
- Technical trends like AI and ML are also impacting people and engineering teams, creating uncertainty and excitement (4m36s).
- Building observability into software allows for the measurement of people's experiences, including user behavior and emotions (4m48s).
- The evolution of observability has moved from basic metrics like CPU and memory to more complex measures like latency, response size, and marketing metrics, and is now focusing on higher-level abstractions like user experience (5m3s).
- Web vitals are being used to measure user experience, including metrics like when a user sees the content they want, not just when the page loads (5m37s).
- The distinction between signals and insights is important, with signals being specific technical measurements and insights being amalgamations of signals that provide useful information (6m4s).
- Insights should be understandable by non-technical stakeholders, such as the example of a user having a bad experience loading a product page (6m22s).
- People experience metrics are directly answering business questions, providing a new level of abstraction and understanding of user behavior (6m55s).
- Organizations are embracing observability, which involves making broad and high-level measurements to inform decision-making across the entire organization, not just technical teams, and this trend is expected to continue (7m18s).
Challenges and Benefits of Observability
- To achieve this, organizations must overcome two main challenges: a technical barrier to making information available and a barrier to making that information accessible to users without specialist knowledge (7m46s).
- The technical barrier involves making high-performance queries available to potentially thousands of users concurrently, which is a difficult task that requires a certain level of scale (8m12s).
- Assuming the technical barrier is overcome, the next challenge is to make it easy for users to get the answers they need quickly, which may involve using AI-powered natural language query tools (8m33s).
- Once these barriers are overcome, an organization can achieve a universal language around cross-cutting insights, which can shine a spotlight on areas that may have previously been hidden (9m56s).
- This universal language can be painful for some individuals, particularly those who have been "flying under the radar" in large organizations, and can make them feel vulnerable as their performance is measured against key performance indicators (KPIs) (10m12s).
- The introduction of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) can help create a universal language around operational performance, as seen in the example of a principal engineer at Sabur (9m5s).
- Effective access to and ownership of data is crucial, and this requires a cultural shift and a conversation that may be initially painful but is necessary for success (10m6s).
Impact of Metrics on Human Behavior
- Implementing metrics to measure performance can significantly change human behavior, as people tend to focus on the metrics being measured, and this can lead to unintended consequences such as competitiveness between teams and artificially lowering deployment frequency to improve metrics (10m58s).
- The introduction of metrics can make some people feel nervous and uncomfortable, while others thrive and enjoy the challenge of beating their budgets, leading to a divide in behavior and reactions (10m44s).
- Great leaders see higher visibility as an opportunity to help and get involved in conversations, while less effective leaders may use metrics as a way to force different behaviors and beat engineers, highlighting the importance of good faith and trust in leadership (11m55s).
- Surfacing metrics and being forthright about the truth can be beneficial, but it's crucial to consider the potential consequences and ensure that everyone is treated well, especially if there are leaders who may misuse the measurements (12m10s).
- The success of implementing metrics depends on the premise that everyone is doing their best, and if this premise is not established, it can lead to trouble regardless of the quality of the measurements (12m31s).
Psychological Safety and Blameless Postmortems
- The experience of rolling out metrics across an organization was largely positive, but it took time to address the growing pains and challenges that arose (12m42s).
- Creating a culture of psychological safety is essential, and this can be achieved by adopting blameless postmortem methods and moving away from lengthy and detailed incident management reviews that often lead to bickering and finger-pointing (13m39s).
- Google's Project Aristotle is an example of research on psychological safety, and educating an organization on this topic can help create a more positive and supportive work environment (13m2s).
- A successful blameless postmortem meeting was conducted in 45 minutes, compared to a previous 6-hour meeting, by starting with a 5-minute briefing on the importance of blamelessness and creating an environment where mistakes can be surfaced without fear of retribution (13m53s).
- The goal of blameless postmortems is not to avoid acknowledging mistakes, but to create an environment where mistakes can be openly discussed and learned from, and to attribute mistakes to individuals or teams without malice (14m42s).
- The language used in some postmortem documents can be too indirect, avoiding blame by not naming individuals or teams, which can make it difficult to understand the cause and effect of the mistake (14m57s).
- It's crucial to maintain the relationship between cause and effect when teaching about blameless postmortems and psychological safety, and to show a clear link between actions and their consequences (15m32s).
- Organizations often get tripped up by prioritizing psychological safety over understanding the cause and effect of mistakes, which can lead to missing the point of postmortems and not learning from mistakes (15m56s).
Agile Backlash and Misunderstandings
- The Agile backlash refers to a trend where organizations are moving away from Agile methodologies, and this is partly due to a misunderstanding of the principles of Agile and a focus on frameworks and tools rather than the underlying methodology (16m20s).
- A good understanding of Agile principles can be achieved by studying the Agile Manifesto and working with experienced coaches who understand the methodology, rather than just following frameworks and tools (16m50s).
- A fundamental misunderstanding about Agile is that it's supposed to speed up the development process, when in fact it's about incrementally delivering working software to facilitate a feedback loop and continuous improvement (17m29s).
- Agile is not about getting things done faster, but rather about delivering a working product early on, giving organizations the opportunity to go live with it if they can productionize it creatively (17m44s).
- The Agile industrial complex, which involves organizations selling Scrum certifications and other Agile-related services, has contributed to the backlash against Agile, with some certifications holding little value (18m22s).
- Not all Scrum certifications are useless, and some can provide valuable insights into Agile principles and practices, such as the idea that the Scrum guide is a guide, not a holy text (18m59s).
- There is a backlash against Agile certifications due to the varying quality of services, with some being good and others being bad (19m21s).
- Agile coaches were given a significant amount of power in organizations, and while some were great, others provided bad advice that had a significant impact due to their influence on powerful people (19m32s).
- The impact of bad advice from Agile coaches can be enormous, and it's a problem that the software industry is still trying to figure out how to address (19m59s).
- The backlash against Agile is driven by three factors: the King's taking experts at their word without questioning, the lack of understanding of software delivery, and the focus on interpersonal aspects rather than delivery. (20m24s)
- The principles of Agile are still valuable and ingrained in software engineering, but the term "Agile" has become less prevalent due to fatigue, with people instead talking about continuous delivery. (20m52s)
- The core principles of Agile are part of software engineering now and are here to stay, despite the backlash against the term. (21m18s)
Empowering Teams and the "Golden Path"
- To get teams aligned around a common goal and give them autonomy, it's essential to provide a "golden path" that allows teams to make effective decisions while moving in the same direction. (21m45s)
- A previous approach to consolidate tools and reduce costs by imposing a single tool on teams was unsuccessful, as teams didn't understand or adopt the tool. (22m27s)
- A more effective approach is to provide teams with the freedom to make meaningful decisions about their software while ensuring they're moving in the same direction, as seen in the successful migration of hundreds of developers onto a new platform. (22m51s)
- The Manchester P platform as a service, later renamed, was a successful project that allowed teams to make effective decisions while moving in the same direction, with a "golden path" in place. (23m12s)
- A platform was built using Kubernetes and Jenkins for CI/CD, with the requirement that every team in Manchester tag their resources and provide deployment information, including what was deployed and what went into it (23m27s).
- The goal was to automate the process of filling out change request forms, especially during busy periods, and make the platform easy to use and almost invisible, incentivizing teams to use it as intended (24m13s).
- The platform was designed to make the "horrible stuff" easy, including dashboard generation, alert generation, and metric generation, and was onboarded by all teams in a couple of weeks with no pushback (24m31s).
- One team was able to produce a working API and UI in their first sprint, with scalable infrastructure, using the platform's features such as HTTP service metrics and tracing (24m40s).
- The team was able to go live in weeks instead of months, and the project changed the conversation and led to a new project across the entire organization (25m24s).
- The story was shared to illustrate the importance of giving engineers autonomy and ownership of their work, and that a top-down edict approach can lead to the best people leaving and the worst people trying to work through it (25m34s).
- To give engineers autonomy, a platform should be built that is highly configurable, self-service, and automates painful tasks such as compliance and change request notifications (25m54s).
- The goal is to make the "golden path" the easy path, so that engineers can focus on their work without being bogged down by bureaucracy (26m14s).
- To encourage the right behavior, it's essential to make the easy decision the right one, which requires hard work to go against, and incentivize the right behavior by making painful parts of life easy, using both the "carrot" and the "stick" approach (26m30s).
- This approach involves incentivizing the behavior and letting individuals choose, which is a preferred method of leadership (26m47s).
Connecting with Chris Cooney
- For those who want to continue the conversation, Chris Cooney can be found on LinkedIn by searching for his name, and he is usually active on the platform for work-related topics (26m59s).
- Chris Cooney can also be found on LinkedIn by searching for variations of his name, such as "Chris Cooney Logics" or "Chris Cooney observability" (27m10s).