0 → 1, Shipping Threads in 5 Months

13 Nov 2024 (1 month ago)
0 → 1, Shipping Threads in 5 Months

The Launch of Threads

  • Zahan, an engineer at Meta, discusses the launch of the Threads app, which was developed in response to Elon Musk's acquisition of Twitter and the subsequent changes made to the platform (17s).
  • In January of the previous year, Zahan was looking for a new project to work on, and the tech scene was buzzing about Elon Musk's business decisions since taking Twitter private in November (38s).
  • Mark Zuckerberg had previously pivoted Facebook and Instagram to focus on private communication between friends, leaving Twitter as the de facto forum for public debate and discussion (54s).
  • After Elon Musk took over Twitter, he made several controversial decisions, leading to concerns that the site would experience an extended outage, depriving millions of users of their favorite online platform (1m22s).
  • Meta saw an opportunity to create a competing product, but initially thought that making text posts easier on Instagram would be enough, which proved to be unsuccessful (1m53s).
  • The team realized that they needed to build a new product with its own culture and norms, rather than trying to adapt an existing product like Instagram (2m21s).
  • The main concern was the time it would take to build a new product, as the window of opportunity to capitalize on Twitter's changes was fleeting (2m42s).
  • Time to market was prioritized, and every shortcut was considered to ensure that the product could be shipped quickly (3m0s).
  • The team defined the basic values of the product, including a text-first format, carrying over Instagram's design language and ethos, and openness, with a community that could craft its own experience with an API (3m22s).
  • The team considered the current social media landscape, where public content is widely available through embeds, and people share their Twitter feeds across various platforms, and decided that a new walled-garden product wouldn't be successful (4m0s).
  • They took inspiration from Mastodon, the fediverse, and interoperable social networks, and prioritized the needs of creators, who produce the majority of content on social networks, often following a zipf distribution (4m15s).

Development Milestones and Strategy

  • The team outlined four milestones to achieve their goal, with each milestone designed to be a possible end-state where they could ship the product if needed, gradually adding essential functionality (4m59s).
  • Milestone one focused on standing up the app with basic functionality, such as logging in, making text posts, and associating them with an account (5m23s).
  • Milestone two added familiar features like tabs for feed, notifications, and profile, as well as basic integrity features like blocking and reporting profiles (5m35s).
  • Milestone three, the lean launch candidate, fleshed out neglected services like people search, full-screen media viewer, and conversation ranking, and added features like copying photographs from Instagram and muting profiles (5m48s).
  • Milestone four aimed to enable interoperability with the fediverse, but the team ultimately shipped a product similar to milestone 3.5, with a much-needed iteration on top of milestone three (6m17s).
  • The team took a "Polish for launch" mindset as each milestone neared, shifting from building features to refining the product, which was exhausting but helped simplify the product (6m42s).
  • The strategy of setting milestones and iterating on the product served as a strong forcing function to simplify the product and focus on essential features (7m10s).

Leveraging Instagram's Infrastructure

  • The team started working on the project in earnest in February and aimed to have it ready by the summer, with a "trick up their sleeves" that helped them achieve their goal (7m42s).
  • The initial plan was not to build the Threads platform from scratch, but rather to reuse existing features from Instagram, specifically broadcast sharing, which allows users to follow profiles, access a feed of posts, and respond to each other, building communities around interests (7m53s).
  • The decision was made to reuse Instagram's back-end wholesale, with some custom functionality for Threads, and to fork the Instagram app code base on each platform, allowing the team to start with a fully featured product and strip it down to the necessary functionality (8m12s).
  • The first prototype added a mode to the Instagram feed that surfaced text-only posts, reusing the ranking and layout, but with the caption on top and media on the bottom (8m37s).
  • This approach reduced the technical scope of the project, turning the problem of building a new text-based social network into a more specific one of customizing the Instagram feed to display a new text post format (8m58s).
  • The approach has major downsides, including accumulating technical debt, using a code base for something it wasn't designed to serve, and requiring a deep understanding of the legacy codebase (9m10s).
  • The team focused on simplicity, allowing users to onboard by logging in with Instagram, borrowing Instagram's design language, and surgically reusing what they could while rebuilding only where necessary (9m48s).
  • The Threads team is grateful for the foundation laid by the Instagram and product teams over the years, and the focus on simplicity paid off, with much of the praise received at launch being for the Spartan simplicity of the app (10m7s).

Focus on Simplicity and Safety

  • To tune the experience quickly, the team used server-driven UI, sending down a full view model that told the client exactly how to render the core interfaces, allowing for easy iteration and experimentation (11m1s).
  • Another big focus was making the space feel safe and welcoming for all, having seen public social networks before (11m32s).
  • The goal of creating a new social network, Threads, is to build a space where people can have meaningful conversations, but this can be challenging as many online spaces turn into angry environments where people don't listen to each other (11m40s).
  • Factors that contribute to achieving this goal include the product culture set by early users, tooling that helps people maintain control over their experience, and moderation (11m51s).
  • The product culture of Threads was influenced by its early users, who were experienced in online communities and promoted behaviors such as blocking early and often, not engaging with rage bait, and not using quote posts to dunk on people (11m55s).
  • Tooling features that help users maintain control over their experience include the ability to block, restrict, mute, and hide replies, which give users control over the conversations generated by their posts (12m8s).
  • Moderation is essential to making a space acceptable to a mainstream audience, and it involves addressing extreme speech and providing a safe environment for users (12m35s).
  • Moderation is considered a unique value proposition for a new social network, and it's what people are subscribing to (12m56s).
  • The team behind Threads brought a decade of experience from Facebook and Instagram, which helped them start from a good starting point in terms of moderation (13m5s).

Pre-Launch Buzz and Technical Stack

  • Creating a buzz for the product before launch is also important, and this can be achieved through features such as the "golden ticket" Easter egg, which sparked interest and curiosity among users (13m25s).
  • The technical stack of Threads is based on a monorepo and uses a Python binary called Distillery, which talks to Meta's larger monolith, a massive PHP binary called dubdubdub (14m25s).
  • The data for Threads users is stored in a variety of systems, including a graph data model and a sharded MySQL deployment, which stores almost all user data (14m56s).
  • The graph data model is natively familiar with links between nodes and has optimized operations to query details of those links, and there are indexing systems on top that let users annotate particular links with how they want to query them (15m8s).
  • There are several services that support the decision-making process around restrictions, including a big HLL service, a key-value store called ZDB, a serverless compute platform, and a Kubernetes-like system for managing and scaling services (15m37s).

Launch Day Challenges and Rapid Growth

  • The product, Threads, was initially planned for a mid-July launch, but due to Twitter's plan to restrict consumption of tweets for free users, the launch was moved up to July 6th (16m18s).
  • To generate buzz, Easter eggs were opened, and an Early Access program was started on July 5th, allowing celebrities to try the product (16m38s).
  • On the day before the launch, engineers were preparing for the launch by upsizing systems, sketching out the run of show, and making other necessary preparations (17m4s).
  • A data engineer noticed tens of thousands of failed login attempts on the app, which was odd since no one should have had access to the app yet (17m12s).
  • The issue was quickly identified as being caused by the app store's pre-order feature, which made the app available in East Asian countries once it was past midnight (17m47s).
  • A war room and Zoom call were set up with close to 100 experts to address the issue, and a new target launch time was chosen to give the team time to prepare (18m13s).
  • The team spent the time before the new launch time upsizing all the systems that Threads touched, including a Zippy DB cache that needed to be resized to handle 100x the capacity it was provisioned for (18m45s).
  • The user growth in the first couple of days was significant, with a million people downloading and onboarding the app in the first hour, and 10 million in the first day (19m20s).
  • Threads gained 70 million users in the first two days and 100 million in the first five days after its launch, but the novelty effects started to wear off and the buzziness subsided after that (19m31s).
  • Despite the relatively smooth ride with no major visible downtime, the team had to deal with numerous fires, including capacity issues and database query problems (19m50s).

Post-Launch Fires and System Improvements

  • One of the fires was related to Mark Zuckerberg's timeline, which was failing to load due to an order of magnitude more interactions with his posts than anyone else in the early network (20m28s).
  • The root cause of the issue was a particular database query that needed to render every post, but the runtime scaled with the number of reposts, and it was fixed by adding an index (20m50s).
  • Another fire revolved around copying the follower graph from Instagram to Threads, which was a limited operation, but the issue arose when users wanted to follow people who hadn't signed up for Threads yet (21m11s).
  • The system originally designed couldn't handle the scale of big celebrities signing up, such as former President Barack Obama, and the team had to redesign the system to horizontally scale and orchestrate workers (22m10s).
  • The team manually worked through the backlog to ensure potential engagement wasn't left on the table, and the redesigned system has worked smoothly ever since (22m32s).

Post-Launch Feature Development and Future Plans

  • After the launch, the team pivoted to address features users were asking for, including opening to Europe, shipping a following feed, and releasing a web client (23m0s).
  • The development team has a limited release of an API that allows users to read and write posts, polished content search, and shows trending conversations on the network (23m24s).
  • The team is working on ranking the follow graph and understanding content to match people's revealed preferences, while also catering to the needs of power users from Twitter and new users (23m37s).
  • The team is adopting ActivityPub, an open protocol for interoperating between microblogging networks, and is building it piece by piece to ensure it is done correctly (24m7s).
  • The reason for not integrating ActivityPub quickly is that it cannot be integrated into the Legacy Instagram broadcast sharing stack, and the team wants to make strong guarantees about how they process imported data (24m37s).
  • The team is rewriting their business logic and backend stack, which is an ongoing process (24m50s).

Key Learnings and Reflections

  • The biggest takeaway from the experience is the power of keeping things simple, and being clear about the value provided can guide hard decisions on where to cut scope (25m9s).
  • Another learning is that cleaner, newer code isn't necessarily always better, and the little learnings encoded into an old, battle-tested codebase add up (25m30s).
  • Building a product often answers questions better than abstract analysis, and prototypes can answer questions quicker than slide decks (25m45s).
  • The team acknowledges how lucky they are for the opportunity and reception of the product, and none of it was guaranteed (26m4s).

Team Size and Structure

  • The engineering team started with 5-10 engineers in January and grew to 30-40 engineers by launch, with a total of around 100 engineers contributing in some way (27m9s).
  • The core product team stayed under 50 engineers, and they were able to reuse existing infrastructure and platform teams (27m48s).
  • The team ensured the product worked and didn't kill Instagram, despite not having much time to test due to the timeline, and they lived on the product from the time it was a standalone app (28m4s).
  • Instagram itself is a big organization with a couple of thousand people, and for at least a couple of months, a couple of thousand people were using the app internally, sharing content and wiping it all before going public (28m31s).
  • The team got a fair amount of ad hoc testing, but they were not prepared for the scale, and they had to react quickly and rely on systems built for that scale (28m52s).

Testing and Code Reuse Challenges

  • On the back end, the team has a good test-driven development culture, writing a lot of integration tests for APIs, but there's no strict adherence to it (29m13s).
  • The team reused the code base from Instagram, which meant reusing it with all its edge case handling, but this made it difficult to know which context they were operating in (29m42s).
  • The team has a big project underway to disentangle the code base, which is a trade-off in architecture, and the right trade-off was to take on that tech debt to launch the product sooner (30m1s).
  • The team is rewriting the code base and teasing things apart, like changing the engine of a plane while it's in flight (30m28s).
  • The team started by forking the Instagram code but is not planning to reunify it, instead, the code bases are drifting apart over time as Threads becomes more unique (30m42s).

Data Model and API Evolution

  • The data model and APIs for Threads and Instagram are similar but are being teased apart, with the team moving towards adopting GraphQL, which allows for more granular specification of data and server definition (31m30s).
  • The approach used for building Threads involved inverting the data model, where the client requests specific pieces of data, and the server composites data together, allowing for labeling and teasing out specific data for Threads or Instagram (32m14s).
  • This approach enabled quick iteration on the UI from mobile applications, similar to how Uber uses a generic card format that the backend returns, but in this case, it was more product-specific (32m41s).
  • Frameworks were used to handle generic tasks, such as server-side rendering, but the data model used for Threads was custom, allowing for iteration and experimentation with different rendering ideas (33m2s).
  • The custom data model allowed for quick iteration and experimentation with different rendering ideas, such as how to render self-threads and reply CTAs, using a dogfooding community of a few thousand people (33m31s).

Architectural Decisions and Technical Debt

  • If given the chance to do it again, some architectural decisions would be made differently, such as spending more time detangling specific data models, which would have saved time in the long run (34m30s).
  • The overall ethos of reusing what can be reused was successful and would not be changed, but some technical decisions would be approached differently (35m5s).
  • The team uses mature frameworks for data migrations, which were developed from past experiences, such as splitting Messenger from Facebook or Marketplace from Facebook (35m42s).
  • These frameworks allow for tailing all mutations that happen on the main data store, MySQL, and enable writing observers that can double-write data and handle data migrations (36m5s).
  • The development process involved maintaining two versions of a data model, such as the user model, to keep up to date, and then migrating product logic to read from both places, depending on the context, whether it's Instagram or Threads (36m32s).
  • The data model is deep, with a post having many linked things that recursively need to be either Threads data or Instagram data, requiring annotations and updates that can be done statically or dynamically at runtime (37m0s).
  • When reusing code, there's a risk of reusing technical debt and mistakes, but this was a trade-off to build the product quickly, with the plan to pay for it later (37m45s).
  • The decision to reuse code and take on technical debt was made to build the product in six months, rather than starting from scratch and taking a year or more (38m10s).
  • The team is balancing the need to evolve the product with the need to make the code base more maintainable and pay down technical debt, with a separate team focused on migration, but this will eventually impact all product developers (39m25s).
  • The process of decoupling code from Instagram and making it more maintainable involves allocating resources and prioritizing tasks, with the goal of making the code easier to read and maintain in the future (38m51s).

Influence, Future Applications, and Organizational Structure

  • Development decisions made for Threads have influenced changes that have gone back into other products, although specific examples are not mentioned (40m30s).
  • The potential future applications of Threads include interoperating with other networks and utilizing ActivityPub, although this has not happened yet (40m41s).
  • The lean nature of Threads has been a valuable lesson for Meta, as it was developed with a small team, contrasting with Meta's typical approach of building large teams around important product goals (41m1s).
  • On the technical side, Threads is doing new and interesting work with ranking, particularly in handling the real-time nature of the product and reducing latency (41m32s).
  • Unlike other networks like Instagram, Threads requires posts to be eligible for ranking in near real-time, as it is often used for breaking news (42m25s).
  • Organizationally, Threads is still an Instagram team and a subsidiary, and while they work with other teams, they are not a straight fork, but rather reuse the same code, which can improve both platforms (43m30s).

Technical Innovation and Short-Term Focus

  • Meta has a system called "better engineering" that incentivizes employees to work on technical debt in their spare time, similar to Google's 20% time policy (43m52s).
  • The main short-term focus for Threads is making the data models clear between apps (44m17s).
  • The decision to use Instagram's architecture for Threads was likely due to Instagram's directed graph nature, as opposed to Facebook's bidirectional friends, which has significant technical implications (44m35s).
  • The choice of architecture was also influenced by Conway's Law, which states that the design of a system is often a reflection of the team that built it (45m3s).

Architectural Choices and Team Dynamics

  • The team that built Threads was familiar with the Instagram text app, which made it easier to repurpose and create the new app, with the options being Facebook and Instagram, and Instagram seeming like a better benefit (45m11s).
  • Initially, around 10 engineers were involved in the project, which eventually grew to around 50 engineers by the end, but the team deliberately kept it small to stay nimble and pivot the product easily (45m46s).
  • The team was able to lean on platform teams, which made a big difference, and around 100 engineers touched something that made it into the Threads app (45m57s).
  • The team was worried about leak risks and kept the product simple, with the goal of making it easy to pivot and make changes (46m17s).

Combating Bots and Abuse

  • Bots were a problem for the team, but they were able to rely on the long history of teams fighting bots on Instagram and Facebook, and they are taking steps to address the issue (46m46s).
  • The team is fighting hard against spammy replies and other forms of abuse, and they consider integrity to be a firefighting domain where they have to continually evolve to stay ahead of adversaries (47m17s).
  • One of the novel challenges the team faced was dealing with the fact that Threads is more about linking out to the web, which creates challenges around awareness of what's going on and web redirects (47m47s).

Overwhelmed by Endless Content?