The Hunt for State of the Art (with Suhail Doshi)

20 Sep 2024 (5 months ago)

Intro (0s)

A product launch is imminent, with significant changes having been made to the product in the final stages of development. (4s)
The current version of the product is impressive, but the next iteration is expected to be even more so. (20s)
Achieving a high level of quality, like that of "soda," requires meticulous attention to detail, even down to aspects like kerning. (23s)

What is Playground? (1m7s)

Suhail Doshi is the founder and CEO of Playground. (1m9s)
Playground is an image generation model with a user-friendly interface. (1m20s)
Playground has recently been launched. (1m20s)

What Garry was able to make using Playground (1m47s)

The speaker created t-shirt designs using a design tool that allows users to upload images and extract the aesthetic from them. (2m11s)
The speaker was able to provide specific instructions to the design tool, such as adding a GPU with two fans to a design. (2m57s)
The design tool allows users to edit designs by using natural language, such as requesting a white background instead of a yellowish one. (3m48s)

The focus on text accuracy (7m4s)

Text accuracy was a primary focus, aiming to enhance the utility of graphics and design by making them more than just aesthetically pleasing art. (7m7s)
The development process involved challenges, with text accuracy initially being as low as 45%, but was eventually improved. (7m33s)
The model's ability to generate utilitarian and useful designs, including logos, t-shirts, and font sizes, positions it as a potential replacement for traditional graphic design software like Adobe Illustrator. (7m46s)

Building a marketplace for Playground (10m44s)

It was observed that users found it challenging to use text prompts effectively, leading to a high rate of unsuccessful attempts and a need for multiple retries to achieve desired results. (13m33s)
To address this, a decision was made to prioritize a visual-first approach, incorporating templates similar to those found in Canva, to simplify the design process and reduce reliance on complex text prompts. (11m23s)
This shift required extensive research and development to ensure coherence and maintain visual consistency, as existing open-source models like Stable Diffusion were not equipped to handle such intricate modifications. (12m15s)

Prompts are like HTML for graphics (16m0s)

Users care less about aesthetics and more about prompt understanding and text generation accuracy. (16m23s)
Extremely detailed prompts are used to train the model, but users can still input simple prompts like "nature scene". (18m8s)
The product removes the need for prompt engineering by expanding and exploding prompts into a multi-caption level system. (21m3s)

Creating new design professions (22m25s)

A new profession of "AI designers" is emerging, with companies actively hiring individuals skilled in using AI for design purposes. (22m44s)
The development of this AI model prioritized achieving high text accuracy and detailed image reconstruction, surpassing the limitations of existing models like Stable Diffusion. (25m0s)
The model's architecture is entirely novel, diverging from both Stable Diffusion and other open-source models like those using Transformer architectures. (24m38s)

Using tailwinds of what is happening in language (26m13s)

The model's impressive prompt understanding is partly attributed to advancements in language models, particularly those from companies like Google and Meta. (27m56s)
The current model's language comprehension is comparable to GPT-3, a significant improvement over previous models that were more akin to the Word2Vec model from 2013. (29m13s)
Despite its advancements, the model still exhibits weaknesses in understanding concepts like "film grain," spatial positioning (left and right), and requires further development. (29m34s)

Problems with aesthetics evals (30m6s)

A new issue discovered is that AI image generators that adhere too closely to user prompts can receive lower aesthetic scores in A/B testing. (30m27s)
For example, an AI image generator that accurately followed a prompt to create a split-plane image of a woman was rated lower than a generator that created a more aesthetically pleasing single image of the woman. (31m1s)
This presents a problem for evaluating AI image generators because it is difficult to determine if a lower aesthetic score is due to the generator not being as aesthetically pleasing or if it is due to the generator adhering too closely to the prompt. (31m49s)

The commercial applications (32m42s)

Companies that are successfully utilizing AI tools like Playground are replacing traditional roles, such as graphic designers, indicating a significant commercial shift. (32m54s)
AI tools are empowering individuals, such as musicians, by granting them greater control over their creative process, eliminating the reliance on external parties like designers. (33m23s)
Y Combinator encourages founders to recognize the potential of AI in enhancing their core products and services. (33m51s)

When the users you get are not the users you want (33m54s)

Users of an image model were primarily generating near-pornographic content, leading to a decision to not build a business around that use case. (34m8s)
A similar situation occurred with a previous analytics company where gaming companies, despite generating substantial revenue, had poor retention and were not a desirable long-term market. (37m2s)
The decision to focus on the larger market of the entire internet and mobile, as opposed to just gaming, proved successful. (38m17s)

Reflections on going through YC twice (40m30s)

The speaker pivoted from a successful company to a browser-based company called Mighty, aiming to create a new kind of computer by streaming the browser. However, they hit a wall when they couldn't significantly improve the speed and decided to move on. (40m47s)
The speaker believes that strategy is valuable in Silicon Valley and that their company, Mighty, was trying to solve a real problem with browser limitations. However, the landscape changed with Apple's M1 chip, and despite their efforts, they couldn't overcome the technical challenges and ran out of ideas, leading to the decision to pivot. (41m43s)
The speaker's interest in AI led them to explore AI applications, even attempting to intern at AI companies. Despite early efforts and recognizing the potential of AI, they made a misjudgment about the timing and missed the initial wave of AI advancements. (47m3s)

Running a research lab/startup hybrid vs a pure startup (48m30s)

It is difficult to conduct research within a fast-paced startup environment because research requires time and cannot be rushed. (49m0s)
Allowing researchers some freedom to explore their own interests can lead to impressive results, as seen in the approach of OpenAI. (50m17s)
Current evaluation methods for language models, often focused on academic benchmarks, may not accurately reflect real-world user needs and could explain the popularity of certain applications like homework assistance. (51m52s)

What it takes to make a state of the art model (53m35s)

Achieving a state-of-the-art (SOTA) model requires meticulous attention to detail and a dedication to perfecting even the smallest aspects. (53m36s)
This dedication involves constantly analyzing and refining the model's capabilities, such as text generation, with a focus on minute details like kerning and skin texture. (53m47s)
This iterative process of identifying and improving upon even seemingly insignificant flaws is crucial for pushing the boundaries of model performance and achieving SOTA results. (54m41s)

Outro (55m9s)

It is difficult to achieve something, but it is possible. (55m10s)
Playground is available on browsers at playground.com, and on Android and iOS app stores. (55m24s)
Unlike many other products, Playground did not have a waitlist and was available on launch day. (55m32s)

The Hunt for State of the Art (with Suhail Doshi)

Intro (0s)

What is Playground? (1m7s)

What Garry was able to make using Playground (1m47s)

The focus on text accuracy (7m4s)

Building a marketplace for Playground (10m44s)

Prompts are like HTML for graphics (16m0s)

Creating new design professions (22m25s)

Using tailwinds of what is happening in language (26m13s)

Problems with aesthetics evals (30m6s)

The commercial applications (32m42s)

When the users you get are not the users you want (33m54s)

Reflections on going through YC twice (40m30s)

Running a research lab/startup hybrid vs a pure startup (48m30s)

What it takes to make a state of the art model (53m35s)

Outro (55m9s)

Browse more from
Generative AI

Stanford CS236: Deep Generative Models I 2023 I Lecture 16 - Score Based Diffusion Models

Stanford CS236: Deep Generative Models I 2023 I Lecture 18 - Diffusion Models for Discrete Data

Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

Stanford CS236: Deep Generative Models I 2023 I Lecture 2 - Background

Generally AI Episode 2: AI-Generated Speech and Music

Overwhelmed by Endless Content?

The Hunt for State of the Art (with Suhail Doshi)

Intro (0s)

What is Playground? (1m7s)

What Garry was able to make using Playground (1m47s)

The focus on text accuracy (7m4s)

Building a marketplace for Playground (10m44s)

Prompts are like HTML for graphics (16m0s)

Creating new design professions (22m25s)

Using tailwinds of what is happening in language (26m13s)

Problems with aesthetics evals (30m6s)

The commercial applications (32m42s)

When the users you get are not the users you want (33m54s)

Reflections on going through YC twice (40m30s)

Running a research lab/startup hybrid vs a pure startup (48m30s)

What it takes to make a state of the art model (53m35s)

Outro (55m9s)

Browse more from Generative AI

Stanford CS236: Deep Generative Models I 2023 I Lecture 16 - Score Based Diffusion Models

Stanford CS236: Deep Generative Models I 2023 I Lecture 18 - Diffusion Models for Discrete Data

Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

Stanford CS236: Deep Generative Models I 2023 I Lecture 2 - Background

Generally AI Episode 2: AI-Generated Speech and Music

Overwhelmed by Endless Content?

Browse more from
Generative AI