Dungeons and deployments: The clusters of chaos

16 Nov 2024 (2 months ago)
Dungeons and deployments: The clusters of chaos

Introduction of the D&D Game and Players

  • The presentation "Dungeons and deployments: The clusters of chaos" is a talk about Kubernetes security risks, presented in a unique format of a Dungeons and Dragons game. (0s)
  • Noah Abrahams, an independent Cloud consultant, is the Dungeon Master (DM) for the game, acting as the Kubernetes API server, keeping notes, answering queries, and directing actions. (1m6s)
  • Kat Cosgrove, a Kubernetes project release lead, plays the role of a warlock named Tav, acting as a load generator to put stress on the application and ensure it's working at its best. (1m37s)
  • Ian Coldwater, a senior principal security architect at Docker and co-chair of Sig security for the Kubernetes project, plays the role of a rogue named Goose, representing chaos engineering and causing problems on purpose. (2m8s)
  • Natalie J. Vatco, an open source architect at Cisco and co-chair of the special interest group for documentation for the Kubernetes project, plays the role of Era the Tlingit grave cleric, acting as a CI/CD system responsible for defending the pipeline and the chain of healing supplies. (2m37s)
  • S Momes, a lead infrastructure engineer at Acuity MD and past member of various Kubernetes release teams, plays the role of a half-orc Druid named Jam, showcasing that Kubernetes is suitable for running large Java services. (3m19s)
  • The game is a short Dungeons and Dragons session where the players will embark on a quest to recover the magical bucket of Secrets, an artifact containing hidden information and answers to their questions. (4m37s)

The Quest for the Magical Bucket of Secrets

  • The players had previously sent three of their members, Tav, Era, and Jam, on a quest to recover the magical bucket of Secrets, which was stolen by a rainbow-plumed thief, Goose. (4m41s)
  • A wild goose chase has begun to find the Duci of DevOps, and the journey starts by descending the rocky slopes of Mount Device, leading to a village in disarray, where a goose has caused chaos by throwing rakes and keys into a lake (5m33s).
  • The villagers had no rules against the goose's actions, and now they are sorting out the chaos, highlighting the need for centralized policy enforcement (6m13s).

Encounter with Goose and the Concept of Chaos

  • A conversation with an old farmer reveals that the lack of rules allowed the goose to cause chaos, and the farmer agrees that having rules would be beneficial (6m34s).
  • The group decides to continue their journey to the city, following the road, and meets a mysterious figure named Goose, who claims that what they are calling anarchy is actually chaos, and that anarchy is order without power (7m44s).

Discovery of the Crate and Secure Configurations

  • Goose joins the group on their quest, and as they travel further down the road, they notice a wagon in front of them, from which a crate falls off, but the wagon's occupants don't seem to notice (8m50s).
  • The group decides to inspect the crate and finds free stuff, which they start digging through (9m19s).
  • A crate is opened to reveal a large polished curved disc of glass wrapped in wool and straw, which is added to the character sheet, highlighting the importance of secure configurations in clusters and workloads, just like the insecure workload configuration of the wagon that carried the crate (9m32s).
  • Secure configurations are crucial, and bad examples include running as root or with privileged containers, which can be prevented with tools like Open Policy Agent, but require knowledge of the application's needs to function properly (10m7s).

The City Gate and the Importance of Logging and Monitoring

  • The scene shifts to a city gate, where two guards, Logan and Martin, are stationed, and they allow the adventurers to enter the city despite their list of allowed individuals being blown into the moat, illustrating a bad case of inadequate logging and monitoring (10m49s).
  • The importance of centralized logging and monitoring is emphasized to prevent compromises from going undetected, but it's also important to set alerting thresholds and escalation procedures to avoid being overwhelmed by notifications (11m49s).

Secrets Management and the Marketplace

  • The adventurers arrive in the main square of the city, where they meet a Bard and a guild informant, who provides them with a password to enter the city through the back entrance, obtained by sneaking into the Bard's pocket (12m51s).
  • In a scenario, a password "Hunter2" is mentioned, which sounds like a bad instance of Secrets management failure, as secrets are still objects in the cluster and can be easily read if one has access to a particular Secret object, even though it's base 64 encoded (13m37s).
  • Although storing secrets is better than putting passwords as environment variables in YAML, and there is encryption at rest, one should be careful to avoid checking the unencrypted object into a repository (14m5s).
  • A marketplace is described as chaotic, with various items spilling over onto each other's stalls, making it hard to tell anything apart, and an investigation role is needed to find anything (14m42s).
  • The marketplace lacks Network segmentation controls, and without these controls, Kubernetes defaults to having one big flat Network, allowing any pod in a cluster to potentially talk to any other pod (15m11s).
  • Network policies, CNI plugins, and service measures can restrict communication and avoid behaving like one giant switch, and logical segmentation, such as putting all PCI traffic on a separate cluster, is ideal (15m13s).

Outdated Components and Supply Chain Vulnerability

  • A vendor map is found, which can help navigate the marketplace, and food vendors often have information about the going-ons of their areas (15m1s).
  • A character eats old soup made with a memory leak, causing them to freeze, and this situation is compared to the problem of outdated or vulnerable Kubernetes components (17m1s).
  • Vulnerabilities can strike any number of components, subsystems, or applications running within a cluster, and staying on top of CVE notifications and releases is crucial to avoid being left with systems that are not only lacking in functionality but also not receiving security updates (17m16s).
  • After receiving directions, the character exits the market and enters the docks area, where they see barges being unloaded and a crane carrying crates, but the chain snaps, causing everyone to make a dexterity save, and the character takes a large amount of damage (18m6s).
  • Due to bad labeling, the contents of the crates are unknown, and it's best to leave the area to avoid a supply chain vulnerability, which can lead to security issues in the base image (18m54s).
  • To fix the lack of trust, referencing base images by SHA instead of mutable names, strict image registry management, using SBOMs, and enforcement via admission controllers can be used (19m28s).

Broken Authentication and the Observatory

  • The character approaches a large archway with a long line of people but notices some individuals bypassing the line by walking through a hole in the wall next to the archway, highlighting broken authentication mechanisms (19m46s).
  • Authentication and authorization are crucial for security, and if either mechanism is broken, security suffers, and identity may be compromised (20m26s).
  • The character has two options at the end of the docks: going up the stairs to a building or walking into the ocean, but the latter is not recommended (21m0s).
  • The character chooses to go up the stairs and finds an observatory with no visible security or ticketing, which seems wildly insecure, especially considering the potentially expensive equipment inside (21m27s).
  • Role-Based Access Control (RBAC) is not just about specifying which actions a user can take, but also about knowing what those actions are really doing, as some actions may give away more content than expected (22m14s).
  • The list and watch actions for the back end give away more content than expected, as their results contain full objects and all of their contents (22m26s).
  • It's essential to be extra careful with cluster admin and other grouped permission levels, knowing exactly what permissions are being given to avoid unintentionally giving out more access than intended (22m42s).

Repairing the Telescope and Cluster Security

  • A local law in the DUI states that anyone who repairs a fixture of the town, such as the Observatory, is entitled to a boon from The Duchess (23m47s).
  • To repair the telescope, the lens cover needs to be removed, and the lens installed properly, and then an Arcana roll may be needed to finish aligning and focusing the telescope (24m11s).
  • The telescope is oriented towards the constellation Anum, The Guiding Goose, and fixing it earns cheers from the room full of astronomers (24m52s).
  • Kubernetes has many moving parts, and misconfigured cluster components can open the cluster up to various attack vectors, emphasizing the importance of regular auditing and having cluster configurations visible in Version Control (25m14s).
  • It's crucial to know what exactly is being run in the cluster, why it's there, and what it's supposed to be doing to avoid misconfigurations (25m43s).

The Duchess and Certificate Management

  • A group of astronomers point towards the Duchess's manner, a gated community with a mysterious symbol on the gates that cannot be read due to a lack of ancient translation mechanisms (26m11s).
  • The group approaches the gates and is waved through by a guard, who may be an outdated admission controller, possibly running Kubernetes 1.6 (26m51s).
  • The group enters the manor and is escorted to a sitting room where they meet Duchess Julia, the keeper of laws in the land, who greets them and explains that her family has been managing Boons and laws for generations (27m28s).
  • Duchess Julia mentions that manual certificate management is a huge problem, leaving room for errors and compromise, and places a human as a critical failure mode for automated infrastructure (28m10s).
  • Problems can occur from certificates being incorrectly handled, expiration, or a lack of trust by users, which can cause disruptions, downtime, and be exploited as a vector for services to be compromised (28m32s).

The Revelation of Goose's Actions and Motives

  • The group shares their desires with Duchess Julia, including riches, soup, and passage on her boat to the island of the oracles (29m11s).
  • However, it is revealed that one of the group members, named Goose, is actually the person they are looking for, who stole an S3 bucket and compromised the API server to get to the island (29m50s).
  • Goose explains that they took these measures to keep the people safe, as they believe The Guild, who runs the town, is not actually helping the community and keeping the people safe (30m22s).
  • A group of individuals are aware that some people are dedicated to seeking riches and power for themselves, and believe that power corrupts and those who seek it cannot be trusted with it (30m44s).
  • A powerful S3 bucket was deemed too powerful for anyone to have access to, so it was locked down and plans were made to destroy it in the fires of Mount O, on the island of the oracles (31m3s).
  • The group is invited to join a journey to the island of the oracles, with the goal of destroying the S3 bucket, and everyone is encouraged to come along (31m28s).

The Journey to the Island of the Oracles

  • The group is reminded that they have more power than they know, and that collectively, they can change the world and make things better without needing permission from others (31m37s).
  • The group decides to join the journey to the island of the oracles, and the next adventure is set to begin (32m19s).

Conclusion and Acknowledgements

  • The talk concludes with a thank you to the OASP Foundation for the OASP Kubernetes Top 10, and to Rick Ashley for the symbol on the Duchess's gates, as well as to the audience for participating in "Dungeons and Deployments" (32m45s).

Browse more from
Kubernetes

Overwhelmed by Endless Content?