Now Anyone Can Code: How AI Agents Can Build Your Whole App

18 Oct 2024 (6 months ago)

Coming up (0s)

The Mac brought personal computing to the masses in 1984, and now in 2024, personal software is available, allowing users to orchestrate a giant army of agents (8s).
This new ability is compared to Mickey Mouse's magical experience in Fantasia, where he learns to control a menagerie of objects, illustrating the potential of personal software to build whatever users want (22s).
With personal software, users can bring their ideas to life quickly, as seen in an example where someone built their 15-year-old idea in just 15 minutes and recorded their emotional reaction (34s).

Intro (47s)

The hosts of the video are Gary, Jared Harge, and Diana, who have collectively funded companies worth hundreds of billions of dollars. (49s)
They are joined by Amjad, one of their best alumni, who has just launched a product called Repet Agent. (1m1s)
Repet Agent is currently in Early Access, meaning it is still in the beta stage of development, and while it has generated excitement, it still contains many bugs. (1m13s)
Despite its early stage, a live demo of Repet Agent will be shown, and Amjad plans to build an application using the product during the demonstration. (1m27s)

Making an app with Replit (1m29s)

A personal app is being created to track morning mood correlated with activities done the previous day, such as coffee consumption, alcohol intake, and exercise, with the goal of logging mood and activities in the morning and sending the data to an agent (1m30s).
The app is built using a chat interface where the agent reads the message and thinks about the next steps, similar to a multiplayer experience on Replit (2m6s).
The agent creates a plan for the app, suggesting features such as visualization, reminders, and a database connection, and picks a tech stack including Flask, vanilla JS, and Postgres (2m28s).
The progress pane shows the AI installing packages, writing code, and building a database connection, making it easy for new software engineers to get started without worrying about dependencies and packages (2m58s).
The app is built with a backend, Postgres, and can be deployed, allowing users to log their mood, view history, and rate the app (3m41s).
The agent tests the app on its own, taking a screenshot and using computer vision to check if something is presented, but also asks for human testing and QA (4m3s).
The models used are multimodal, including Claude Sonnet 3.5, GP4, and in-house models like a fast embedding model and retrieval system, which are important for making the agents work (4m23s).
The retrieval system is a key part of the agent's functionality, allowing it to find the right places to edit in the code, and is a notable achievement (4m57s).
The agent can create data and deploy the app, making it possible to go from an idea to a deployed web app that anyone can access in a short amount of time (5m56s).
The idea of personal software is exciting, allowing anyone to create apps that are tailored to their individual needs (6m19s).

Feel the AGI, personal software era (6m23s)

The concept of personal software has emerged in 2024, similar to how Mac brought personal computing to the masses, with the potential to revolutionize the way people interact with technology (6m23s).
Karpathy tweeted about Repet Agent, describing it as a "feel the AGI moment," which refers to the experience of using artificial general intelligence (AGI) to build software (6m31s).
Using Repet Agent, a Hacker News clone was created, and the experience felt like having a development partner, with the AI agent asking questions and making design decisions on its own (6m41s).
The AI agent demonstrated good intuition about what to build and how to design it, such as creating a slider bar with emojis without being explicitly instructed to do so (6m50s).
The AI agent got stuck at one point, but was able to ask for help and continue working on the project after receiving guidance (7m15s).
The experience of working with the AI agent felt like collaborating with a developer, and it is suggested that having different modes or personalities for the AI agent could be useful, such as a "grumpy programmer" or an "over engineer" mode (7m37s).
The idea of having a toggle to switch between different modes or personalities for the AI agent is proposed, but it is unclear if this feature is currently functional (7m54s).

Having AI code the way humans do (8m7s)

AI programmers do not possess super intelligence that can build an entire app perfectly from start to finish without making mistakes; instead, they code in a way similar to humans, writing code, trying it out, and fixing bugs as needed (8m7s).
The design decision behind AI programmers is to have them act as coworkers, allowing users to close the AI's code and fix it themselves if needed (8m28s).
The goal is for non-coders to learn a little bit of coding along the way as they work with the AI agent, similar to how people in the past learned to code by making small edits to their Myspace page or other online projects (8m46s).
There is a need to revive the incremental learning scale, where people can learn to code through fun, side projects, rather than requiring a computer science degree or boot camp (9m6s).
Fully automated software engineering agents are still far from being developed, and people should still learn how to code, although they will have to do less coding and focus more on reading and debugging code (9m29s).
AI agents can get users fairly far in the coding process, but sometimes they will get stuck, and users will need to go into the code and figure it out themselves (9m41s).

You should still learn to code! (9m51s)

Many young people, such as freshmen, believe that with the advancement of technology, they no longer need to study how to code, but this is not true, and knowing how to code is more important and powerful than ever before (9m51s).
Having the ability to code will allow individuals to orchestrate and leverage the power of AI agents, making them more powerful and able to build whatever they want, whenever they want (10m16s).
The return on learning to code is increasing rapidly, with the usefulness of knowing how to code doubling every six months, similar to Moore's Law (10m46s).
In 2020, knowing how to code was not that useful, as individuals would still get blocked by deployment and configuration issues, but with the help of chat in 2023, knowing how to code a little bit can get individuals fairly far (11m1s).
By 2024, knowing how to code a little bit is massive leverage, thanks to the availability of AI agents and tools like Cursor, which can help individuals extend their abilities and get super far (11m20s).
Programmers are on a massive trajectory of increased power, with their abilities and leverage continuing to grow rapidly (11m36s).

The underlying tech (11m42s)

The technology behind the system is a multi-agent system with a core React-like loop, utilizing a Chain of Thought type prompting that has been around for a couple of years, and most agents are built on this concept (11m45s).
The system is also a multi-agent system, providing it with a ton of tools using tool calling, and these tools are the same ones exposed to people, requiring careful consideration of how to expose these tools and how the agent sees them (12m9s).
The edit tool returns errors from the language server, a Python language server, which provides feedback to the agent as it codes, similar to how a human coder would receive feedback (12m29s).
The agent gets feedback from the language server for any action, allowing it to react to that feedback, and it has access to various tools such as package management, editing, deployment, and database management (12m44s).
To prevent the agent from going off the rails, there are mechanisms in place, including a reflection loop that constantly evaluates whether the agent is doing the right thing, and the use of L chain tools like Lang graph and Lsmith (13m22s).
Lang graph is a tool that allows for building agent dags, and Lsmith is used for debugging, providing a way to visualize the graph and look at the traces for dags (13m29s).
Retrieval is crucial for the system, requiring a neuro-symbolic approach that can do RAG-style embeddings retrieval and look up functions and symbols inside the code (14m5s).
Even with large context windows, specialized tools will still be needed for lookups, and context management is necessary to prevent the model from biasing towards certain information (14m33s).
The agent uses a memory bank to store information from each step, and it needs to be able to pick the right memories and put them in context for each subsequent step (15m9s).
Memory management is crucial when working with AI agents, as it's essential to pick the right memories for the right tasks and not put the entire memory in context, which can be fragile and not great at following instructions (15m44s).
The idea of situational awareness and the sci-fi argument that AGI will kill us tomorrow is rebutted by the fact that simply scaling up parameters and using more GPUs will not work, and there is utility in having agents work together and being smart about intermediate representation (16m6s).
Building a system with AI agents can be humbling and sets expectations about AI progress in a different way, as the systems are fragile and not great at following instructions, and people often talk about the hallucination problem, but the bigger problem is getting them to follow orders (16m52s).
The challenge of getting AI agents to follow instructions and do the right thing is significant, and it's hard to get them to actually do what is intended (17m12s).

The path to AGI (17m19s)

The path to Artificial General Intelligence (AGI) may lead to "functional AGI," which involves automating economically useful tasks, and this goal is considered fairly within reach as it can be approached as a brute force problem (17m20s).
Achieving functional AGI might require building and fine-tuning orchestrations of groups of agents for each task, similar to what has been done for programming, and eventually combining them into one model (17m45s).
The history of machine learning has shown that systems are created and grown around models, but eventually, the model replaces those systems, and hopefully, this will lead to an end-to-end machine learning system that can perform various tasks (18m0s).
The example of Tesla's development is mentioned, where they moved from using logic and other systems to end-to-end training, and it is expected that eventually, a similar end-to-end system will be developed for other tasks (18m17s).
However, functional AGI would not be considered "true AGI" because it would not be able to handle tasks outside of its training data, and true AGI would require efficient learning and the ability to navigate new environments (18m32s).
True AGI would need to be able to learn skills required to navigate new environments with no prior information, and current systems, such as LMS, are not efficient learners and require additional layers, such as symbolic representation, to work effectively (18m50s).
The use of symbolic representation and classical computer science concepts, such as backtracking and Turing completeness, are specialized and not generalized, and while incredibly useful, they are not a sign of true AGI (19m20s).

What users made with Replit (19m41s)

People have already created impressive and interesting projects with a new tool, including a personal app that allows users to put memories on a map and attach files and audio files, which was built in 15 minutes by someone who had the idea for 15 years but didn't have the tools to build it (19m53s).
Another user, Meck, built a Stripe coupon tool in 5-10 minutes, which would be difficult to build with no-code tools and would likely require using multiple tools like Bubble and Zapier (20m36s).
The new tool is seeing a lot of traction, which is a challenge for no-code tools that often start as no-code but then find that users want to build more complex projects that push the limits of what the tool can do (21m23s).
The new tool allows no-code users to switch to coding by initially using prompts and then gradually becoming programmers by editing the code (21m44s).
The tool can be used to build complex projects, such as a recruiting CRM with role-based permissions, which would normally be a $10,000 a month Enterprise feature (22m8s).
The tool has been used to build projects much faster than traditional methods, including generating an app in 10 minutes that took 18 months to build and building an app in an hour that took a year to build (22m42s).
The current AI system can save millions of dollars in human hours, but it's still in its early stages and can't be applied to existing coding stacks yet (23m7s).
A retrieval system has been built to index codebases quickly and provide intelligence about the codebase, with features like summaries of files and projects created using large language models (LLMs) (23m24s).
The next step is to add more autonomy to the system, allowing it to work in the background, fork projects, and send pull requests or report problems when encountered (23m54s).
The vision for the system includes a bounties program where users can submit problems or projects they want to build, and the community can help fix them for a price (24m21s).
The system can also summon a human expert, known as a "bounty hunter," to help with problems the AI agent can't solve, using a real-time market to find an expert for a set price (24m53s).
The idea is to create a human-machine symbiosis, where humans and AI agents work together as part of a greater intelligence orchestration system, with humans being another agent in the system (25m18s).
This approach is inspired by the concept of human-machine symbiosis, which emphasizes the importance of computers being extensions of humans rather than competitors (25m25s).
The ultimate goal is to create a system where humans and AI agents collaborate seamlessly, with humans being able to prompt the agent or edit the code themselves (25m11s).

Challenges in resetting the org (25m56s)

A company had a significant moment earlier in the year with a demo that impressed many, which was the result of hard work on remaking the way software is deployed and written. (25m57s)
The company had previously raised a large round of funding and felt the need to grow, but this led to a layoff and a reset of the organization. (26m18s)
The company was initially very lean, with only four or five employees for many years, despite having millions of users. (26m52s)
The decision to grow and hire more people, including executives, led to a more complex management structure, which ultimately became miserable and unproductive. (27m1s)
The company has since flattened its organization, eliminating multiple layers and meetings, and now focuses on only a few key projects. (27m50s)
The founder is involved in all of these projects and believes that the company has become more productive as a result of getting smaller. (27m55s)
The founder notes that the temptation to add more bureaucracy and management layers can be strong, especially when there are many ideas and resources available. (28m33s)
The company is trying to stay disciplined and focused on a few key projects, rather than trying to do too many things at once. (28m55s)
The concept of the "compound startup" is mentioned, where multiple product lines are treated as separate startups, each with their own governance and decision-making processes. (29m7s)
Parker Conrad, the founder of Rippling, has a unique hiring tactic where he hires former founders and puts them in charge of a product line, which has worked well for the company, but may be challenging for others to replicate due to the difficulty in hiring high-quality former founders unless the company is already successful or has a top-tier recruiter (29m25s).
Parker Conrad also emphasizes the importance of staying connected to customers by answering customer support tickets, which provides a direct line of information on what's really going on with the customer (30m10s).
The development of an AI agent involved building a new technology that the team wasn't used to working on, and it required a big effort to pull it off organizationally (30m33s).
The AI agent was built by a task force consisting of people from different teams, including the IDE team, devx team, uxx and design team, and the AI team, which was at the center and connected to all the other teams (31m13s).
The task force was organized similarly to a Cara diagram, with the AI team as the kernel OS and the other teams creating tools that connected to it (31m34s).
The product team worked on the entry points and structure of the AI agent, which was a challenging task that required frequent meetings and rapid progress (31m55s).
The development process involved regular meetings, including a war room meeting on Mondays and an agent salon on Fridays, where the team would review progress, prioritize tasks, and make changes to the product (32m5s).
Doing a "run" with the AI agent meant literally testing it and reviewing its performance to identify what was working and what was broken (32m36s).
The team went through the product, identified where it broke, and determined the priorities to fix the issues that arose during the process (32m40s).
Each team member built their own agent, with some teams requiring this due to the specific needs of their tasks, such as the ID team creating a screenshot agent (32m47s).
The ID team developed the screenshot agent, which utilized AI to analyze screenshots, generate thoughts, and return them to the main manager agent (32m56s).
The package management team built a text stack setup type of configuration, which was a unique and effective approach (33m7s).
The overall structure and organization of the teams and their agents worked out surprisingly well, with the AI acting as the central user (33m16s).
The success of this approach is attributed to its similarity to how teams worked in the past, with the AI now taking on the role of the central user (33m22s).

Future plans (33m29s)

The next big leap forward for the AI agent is reliability, ensuring it doesn't break or spin, and expanding it to support any stack the user wants (33m35s).
Currently, the agent doesn't listen to user requirements for the stack, but the goal is to accept user requirements and support various stacks, including Python (33m46s).
The agent's UI is being improved to make it more user-friendly, with the possibility of interacting with the AI agent through drawing and voice commands (34m26s).
Future plans include allowing users to draw on a canvas to communicate with the AI agent, making it possible to express ideas more creatively (34m57s).
The iPad app will also be improved to make it more fun and creative, allowing users to hand-sketch UI mockups and have the agent implement them (35m7s).
Simpler agentic tools will be added, allowing more advanced users to have more control over the code they're writing, including single-step or single-action agents (35m24s).
These single-action agents will allow users to review and accept or reject changes before they are implemented, giving them more agency over the code (35m42s).
The AI agent is still in beta, and users are advised to be cautious when using it, but the goal is to make it more reliable and user-friendly in the future (36m7s).

Outro (36m12s)

To test the AI agent, users can sign up for the core plan on Repet, but it's expensive and not free (36m12s).
Once signed up, users can find the module on the homepage that says "what do you want to build today" and start working with the agents (36m27s).
To get started, users should have an idea in mind, write a couple of sentences, and keep it simple, without making it too complicated or technical (36m34s).
Working with the agent should be pretty intuitive, and users can share their projects to get feedback and support (36m47s).
The community is encouraged to share their projects built with the agent, and the team is happy to reshare and retweet them (36m49s).
The video concludes with a mention of "feeding the AGI" and a promise to see the viewers next week (36m53s).