Open Source Friday with LlamaCoder - generate small apps with one prompt

12 Oct 2024 (6 months ago)

GitHub Universe Conference

The GitHub Universe conference is a favorite among developers, offering a unique blend of topics and opportunities to meet people in person, creating a super energizing and nerdy atmosphere (14s).
The community at the conference is great, consisting of diverse people who share a passion for the GitHub platform and strive to improve themselves and their productivity (30s).
The conference features various topics of conversation, opportunities to solve problems with GitHub experts, and a vibrant community with quirky and fun elements, such as programmable name tags (51s).
The atmosphere at the conference is approachable and cool, allowing attendees to make friends and feel comfortable, unlike more corporate-driven conferences (1m20s).

Introduction to GitHub Models

GitHub believes that every developer can be an AI developer with the right tools and training, which is why they launched GitHub models on the GitHub Marketplace (1m32s).
GitHub models offer a handpicked collection of top models with entitlements attached to the user's GitHub account, allowing for exploration and experimentation (1m43s).
The GPT 40 model can be used to interact with the user through an initial prompt, and parameters can be adjusted in the playground to experiment with different settings (1m50s).
The 53 mini instruct model can also be used to handle scenarios, and its response can be evaluated and compared to other models (2m21s).
The model's details page provides more information through the readme, evaluation, and transparency tabs, helping users make informed decisions (2m31s).
Users can start using the models with code by clicking the code button, which provides getting started instructions and access to a preconfigured development environment (2m41s).
The model API calls use entitlements that come with the user's GitHub account, eliminating the need for an API key or signing up for other services (2m57s).
The GitHub CLI can be used to call AI models and combine them with other CLI commands, such as summarizing commits or creating questions for computer science students (3m26s).
GitHub models help minimize friction when exploring and experimenting with AI models, making it easier to build AI-powered apps (3m48s).

Open Source Friday with Hassan

The show "Open Source Friday" features maintainers talking about the apps they maintain, and today's guest is Hassan, a software developer who loves working on open source AI projects (6m6s).
Hassan is a software engineer based in New York and leads developer relations for Together AI, a company that allows users to fine-tune and deploy open source models (7m25s).
Together AI exclusively hosts open source models, including Llama models, Stable Diffusion, and Flux image models, through their API (8m1s).
Hassan's first AI project was using GPT-3 to autogenerate captions for hundreds of images for a conference, which was successful and sparked his interest in AI (9m18s).
Hassan is on the show to talk about LlamaCoder, but the host is also interested in learning more about his other projects and Together AI (8m25s).
Hassan's projects often get millions of views and thousands of users, and the host is excited to learn about his story and how he builds and pushes so many different applications (6m36s).

Hassan's AI Projects

The first project involved autogenerating captions for hundreds of images, which was a mind-blowing experience that led to further exploration of AI capabilities (9m49s).
The autoimage generator application was shared on Twitter in December, two years ago, marking the beginning of forays into AI (10m12s).
The second project originated from a desire to upscale old, blurry family photos using AI, which led to the development of a photo upscaling model (10m38s).
The photo upscaling model was built over a couple of weekends and launched, eventually gaining a few million users and currently having around 200,000 monthly users (11m14s).
The process of building applications involves identifying the simplest thing to build and launch quickly, as many ideas may not succeed (11m44s).
The goal is to create a simple prototype, test the idea, and then decide whether to add new features or move on (12m26s).
The go-to tech stack includes Next.js, a full-stack React framework, Tailwind, and Vercel for deployment (12m29s).
Most applications are one or two pages, doing one thing well, and often involve a single API call (12m51s).
After launching, the decision to add new features or move on is based on the application's performance (13m7s).
The approach to new projects involves descopeing to focus on one thing and doing it well, which is a methodology that has proven successful in various projects (13m32s).
When considering a new project, it's essential to identify one problem that can be solved with technology and focus on doing that one thing well, then launch and iterate based on feedback (14m5s).

LlamaCoder: A React App Generator

Llama Coder was inspired by the Meta team's launch and the desire to use open-source models to generate small React apps with a prompt (14m30s).
The idea for Llama Coder was tested with various open-source LLMs, but none performed well until the release of Llama 3.1's 405d model, which proved to be a good coding model (15m9s).
Llama Coder can generate small React apps, but it's limited in scope and can't handle large-scale applications; it's best suited for small apps or components (15m35s).
The tool uses Sandpack from Code Sandbox to visualize the generated code and provide an interactive code editor in the browser (16m16s).
The combination of Llama Coder and Sandpack allows users to type in a prompt, see the generated code, and interact with it in real-time (16m30s).
The process of building a project involves sharing early versions to gauge interest and reception, and if the response is positive, the project is continued and eventually published (16m45s).
This approach is similar to applying developer relation skills to side projects, where interest is generated and maintained throughout the development process (17m16s).
Even if a project is left behind, the knowledge and research gained can be applied to future projects, making them progress faster (17m32s).
LlamaCoder is an open-source tool that can be used to generate small apps with a single prompt, and it allows users to view and edit the code (17m43s).
The tool offers different models to choose from, including Gemma 2.27B, Llama 3.1 405B, and Quen 2.5 coding models, which are considered underrated or highly effective (18m21s).
A demo of LlamaCoder is shown, where it is used to generate a calculator app, and the code is displayed in a streaming text editor (18m41s).
The generated app can be rendered and used, and users can also edit the code and make changes, such as adding an H1 element or changing the theme (18m55s).
The app can also be published to the internet, and a URL is generated that can be shared with others (19m36s).
The system can regenerate a calculator in a blue theme, and it is possible to ask it to make a quiz app about the Olympics, generating a little quiz app that can ask questions and show scores. (20m1s)
The system can also be used to build landing pages, such as an e-commerce landing page, and it is possible to iterate on the design and add features. (20m26s)
The system uses Local Host, and it is possible to open up the entire code and edit it in a sandbox environment, allowing users to take the project and continue working on it. (20m31s)
The system provides an initial MVP (Minimum Viable Product) that users can build upon, and it is possible to generate landing pages and other types of apps. (21m15s)

LlamaCoder Technical Details

The project was initially teased on social media and received a lot of interest, leading to its launch on Twitter and the creation of a GitHub repository. (21m39s)
The tech stack used for the project includes AI for inference, Sandpack from Code Sandbox, Next.js, Tailwind, and TypeScript, as well as Plausible for analytics and Helicone for observability. (21m58s)
The project has received around 331,000 unique visitors since its launch in early August, with a significant portion of users accessing the site on their phones. (22m22s)
The system uses Code Sandbox's Sandpack for the editor and sandbox environment, allowing users to edit and test their code in a interactive environment. (23m3s)
The project utilizes the Sandpack component, which includes the editor and preview, and allows users to open a sandbox to view the project's architecture (23m17s).
The project's architecture involves a user selecting a model, inputting a prompt, and sending an API request to the generate code route, which calls the Together AI API to utilize the LLaMA 3.1 405B model (24m1s).
The API request takes the user's prompt, calls the LLaMA model, and returns a response that is sent back to the frontend, where Sandpack renders the code and preview (24m10s).
The project's simplicity is a key aspect, with a single page and a single API request (24m34s).
Together AI facilitates LLaMA inference by providing an API that can run open-source models, allowing web developers to utilize these models without needing extensive machine learning knowledge (25m38s).
LLaMA inference is essentially an API call to any supported model, and Together AI supports both TypeScript and Python clients (25m42s).
The API call involves defining the client, choosing a model, sending a prompt, and printing the response as it streams back (26m9s).
The code is a Next.js app, which allows for both frontend and backend code to be in one place, making it a collocated app. (26m41s)
The main folder contains the main homepage code, which is in the page.tsx file, and a backend API route called generateCode. (27m1s)
The generateCode API route is similar to a Lambda function and is used to generate code based on a user's prompt. (27m13s)
The app uses Prisma and a Postgres database, but these are additional features that were added later. (27m35s)
The main way the app works is by taking a user's prompt, calling the backend API route, and then responding with the generated code. (27m47s)
The createApp function calls the backend API route and passes in the user's prompt as a piece of state in React. (28m13s)
The backend API route uses Together AI to generate the code, specifying that it wants to send an API request to LLaMA 3.1 405b. (28m40s)
The API request tells Together AI to generate code for the user's prompt, assuming the role of an expert frontend or React engineer. (28m51s)
The generated code is then sent back to the frontend, where it is rendered in the app, and can be streamed back using a streaming helper. (29m21s)
To run the app in the browser, the Sandpack component is used, which is a simple way to render the generated code. (29m53s)
The code used to generate an application consists of four lines, where Sandpack is imported, and a template is specified for React and TypeScript, with the code for the app being hardcoded but intended to be generated and updated dynamically (29m59s).
A new piece of state is defined for the LLM's response from the backend, which is updated when the API is called, and the generated code is rendered using Sandpack if it exists (30m28s).
The application uses a single API call to generate code and get it back, with the code being rendered dynamically (30m57s).

Prompt Engineering for LlamaCoder

A stream is a way to see the AI's returned output as it's being generated, providing a better user experience by not having to wait for the entire response to be generated before seeing any output (31m20s).
Streaming allows for instant gratification and a more interactive experience, even if it's not truly interactive, by providing a continuous flow of information as it's being generated (33m0s).
The alternative to streaming is waiting for the entire response to be generated, which can take up to 20 or 30 seconds, resulting in a poor user experience (32m42s).
Prompt engineering techniques are used to come up with effective prompts for the LLM, although specific techniques are not mentioned (33m35s).
The Llama Coder uses a post request to get information from the front end, including the model selected by the user, the prompt entered by the user, and whether they want to use ChatGPT or not (34m27s).
The system prompt is defined, and a query is made to the model with the system prompt and the user's suggested prompt, asking the model to return only code (34m57s).
The temperature of the model, which determines the degree of randomness, is set to 0.2, which was found to work fairly well for coding (35m9s).
The system prompt is the most interesting part of the process, and prompt engineering involves a lot of trial and error (35m25s).
Accepted techniques in prompt engineering include asking the LLM to think carefully, telling it that something is important, and relating it to a real-world use case (35m36s).
It is recommended to start with a simple prompt and make it more specific, and using multiple LLMs can also be helpful (36m10s).
The Llama Coder is planning to use multiple LLMs, where the first LLM plans out the project, and the second LLM codes the plan (36m35s).
Using multiple LLMs in this way can lead to interesting results and can be a useful technique in prompt engineering (37m0s).
The process of generating small apps with one prompt involves iteration, starting with a simple prompt and gradually adding more complexity, with the goal of achieving better results (37m5s).
To improve the performance of the language model (LLM), it's helpful to provide it with documentation and examples of components to use, as well as information on how to import and use them (37m31s).
Passing in examples of what you want the LLM to generate can also improve its performance, as it allows the model to learn from the examples and generate better output (38m12s).
Providing an example of a good landing page, for instance, can help the LLM generate a better landing page, as it has a reference point to work from (38m28s).
The key learnings from working with LLMs include not being shy with your prompting, as providing more information and examples can lead to better results, even if it means approaching the token limit (40m25s).
The use of examples and documentation can significantly improve the quality of the generated output, as seen in the comparison between the production version and the local version of the landing page (39m19s).
The local version, which had access to an example of a good landing page, generated a much better output, with a header, hero section, featured game, testimonial section, CTA, and footer, whereas the production version lacked these features (39m49s).
The process of generating small apps with one prompt requires experimentation and iteration, as well as a willingness to try new approaches and provide more information to the LLM (37m10s).
When interacting with large language models (LLMs), it's essential to provide detailed prompts and not be shy about experimenting, as this can lead to more effective results (40m39s).
Giving the LLM an example to work with is also highly effective, and it's crucial not to get frustrated if the first attempt doesn't yield the desired outcome, as iteration and trying again can lead to better results (40m56s).

Improving LLM Performance

Providing the LLM with stakes or incentives, such as a hypothetical reward, can also encourage it to perform better, much like a human coworker would (41m11s).
While LLMs do have context limits, newer models are being released with larger context window sizes, allowing for more extensive prompts and information to be processed (41m44s).
The context limit for the LLM being discussed is 128,000 tokens, which is significantly more than the 2,000 tokens used in the example prompt (42m14s).
The integration of Sandpack for code sandbox involves a generate code route that takes in the model, prompt, and other parameters, sends it to Together AI's LLaMA 3.1 405B, and streams the result back to the frontend (42m58s).
The frontend code includes a create app function that makes an API request to generate code, gets the results, and saves it to a local state called generated code (43m26s).
The UI elements include a header, input, select, and code view, where the generated code is displayed, and the Sandpack component is used to show the code (44m7s).
A code viewer component is used to display the code of an app, and it is a separate component that imports Sandpack from CodeSandbox React, allowing for customization of options such as showing the navigator, height, and tabs (45m12s).
The code viewer component passes in files, including the main file app.tsx, and shared files such as components from SLUI, which are passed in as individual files inside a components folder (45m42s).
The component also passes in dependencies, including libraries like Lucide React and Recharts, which are installed using Sandpack, allowing users to generate icons and use Shaden components (46m18s).
The code is organized in a simple and easy-to-navigate way, making it easy to follow the trail and breadcrumbs to see what's happening where (46m46s).

Deployment and Cost Considerations

When deploying projects, the cost of using API keys is relatively low, with the total cost for the entire app being a few thousand dollars, and the ability to use internal GPUs to run the app reduces the cost (47m31s).
The app is running on the internal GPUs of Together AI, where the developer works, which reduces the cost of paying for external GPUs or API keys (47m41s).
A good example app that was picked up by the media and has thousands of users, hundreds of thousands of clones, and almost 3,000 stars on GitHub is Helicone, an observability tool that provides an overview of app performance, including the number of requests, tokens sent, and generation time (47m53s).
Helicone sends a lot of business to Together AI, as users need to create an account to run the app, which can be expensive if it gets picked up by a large number of users (48m12s).
The app's dashboard provides useful insights, such as the number of requests, tokens sent, and generation time, which takes around 11 seconds on average (48m30s).
If an individual developer were to deploy their own AR projects, they would need to use their own API keys, which can get expensive if the project becomes popular (49m1s).
To avoid this, adding authentication and charging for the app would be a good idea, but this is not feasible for a random developer who wants to launch the app for free (49m18s).
The example app is open-source, and many people have forked it, added pricing, and new features, and are making money from it, which is the purpose of enabling other builders (49m36s).
The idea of open-sourcing the app is to enable other builders, and the creator's purpose is to facilitate this, rather than making money directly from the app (49m48s).

Inspiration and Idea Generation

The creator gets inspiration for their ideas from a long-running list of ideas, other people, and other projects, including Claude artifacts, which was a big inspiration for this project (50m23s).
Claude artifacts is a project that allows users to ask for app generation, similar to what the creator did with their app, and this inspired the creator to generate a real React app (50m43s).
The initial inspiration for building an app came from seeing the potential of a model to go from prompt to code, and wanting to create something similar using open-source models, making it fully open-source for experimentation (51m12s).
The idea for building apps often comes from seeing something cool, especially if it's a closed-source or private project by a big company, and wanting to create an open-source version (51m36s).
Other ideas come from having an interesting concept, such as building a real-time image editor, which was created by accident while building an image playground using the optimized Flux Chanel model (51m50s).
The Flux Chanel model is optimized to run fast, generating images in under a second, and was used to create a real-time image editor that generates images as the user types (52m16s).
Keeping an idea list and getting inspiration from other ideas can help with the creative process, and there's no shortage of ideas, with many side projects and ideas already planned (52m38s).
One issue is turning side projects into full deployments, but relying on a consistent stack, such as Next.js, can make the process seamless (53m20s).
Practice and sticking with one stack can help improve the process of building and deploying projects, avoiding "shiny object syndrome" and allowing for faster and more efficient development (53m33s).
Using a consistent stack, such as Next.js, Tailwind, and TypeScript, can help improve development speed and efficiency, and allow for faster project deployment (53m51s).
To quickly generate and publish small apps, it's essential to keep things simple and optimize for simplicity, which becomes easier with more projects completed (54m16s).
Sticking to one stack and learning it well is crucial for rapid development, with many amazing stacks available, such as Next.js, Laravel, Vue, and Spelts (54m37s).
Building multiple apps with a chosen stack helps gain practice and makes it easier to ship things quickly (54m55s).

Tips for Building and Deploying Apps

Over 800,000 people have used LlamaCoder since August, demonstrating its popularity (55m16s).
The dashboard used in the presentation is from Helicone, and users can access it by creating a Helicone account and integrating it with their API (55m34s).
To access the Helicone dashboard, users need to pass in the specific Helicone URL and API key, which requires only three lines of code (55m52s).
There is no preference for using Claude, Chatbot, Open AI APIs, or Perplexity, as all APIs are great, and the choice depends on individual needs (56m10s).
The preference is for open-source models, and the speaker is biased towards them due to their work at a company that only hosts open-source models (56m20s).
The speaker uses Next.js for full-stack applications and Vercel for deployment, allowing for easy integration of the front end and back end (57m0s).
The speaker is available for questions after the stream and can be reached via Twitter at @nutlope (57m22s).
Building small apps can bring joy and it is encouraged to turn ideas into reality by actually building them, as it is easier and more fun than most people think (57m46s).
The importance of taking action on ideas is emphasized, rather than just writing them down and forgetting about them (58m6s).
Viewers are encouraged to follow Hassan on Twitter and the host on Twitter at nuto (58m15s).
The LlamaCoder repository can be found on GitHub at github.com/nutlope/llamacoder (58m27s).
Viewers are motivated to build their minimum viable product (MVP) applications, iterate on them, and publish them for others to see (58m32s).