Why The Next AI Breakthroughs Will Be In Reasoning, Not Scaling

14 Nov 2024 (5 months ago)

Intro (0s)

A conversation about achieving Artificial General Intelligence (AGI) took place around a year ago, with one argument suggesting that AI will eventually become capable of designing better chips than humans, eliminating a bottleneck to greater intelligence (5s).
The idea is that this development will put us on a pathway to AGI in a way that wasn't possible before (15s).
In a previous episode, the topic of discussion was what to do with two more orders of magnitude, but since then, Sam has expressed a desire to go to four orders of magnitude (25s).
Currently, AI models are rapidly improving, with capabilities emerging that weren't possible a month ago (37s).
This rapid progress is considered a significant moment in history (43s).
The hosts of the podcast are Gary, Jared Harge, and Diana, and they are affiliated with Y Combinator, which has funded companies worth over $600 billion and hundreds of companies every year (58s).
Recently, Sam Altman wrote an article that is relevant to the topic of discussion (1m14s).

The intelligence age (1m15s)

A wild essay predicted that Artificial General Intelligence (AGI) and Artificial Superintelligence (Asi) are coming within thousands of days, with an estimated timeframe of 4 to 15 years (1m22s).
The essay's ideas are similar to those discussed by Sam Altman, the founder of Open AI, in 2015, which at the time sounded like the ideas of a "crazy person" but now seem plausible (1m51s).
In 2015, Sam Altman believed that AGI would be better at doing science than humans and would accelerate the rate of scientific progress in every field (2m57s).
One of the motivations behind Open AI was to create an AI that could accelerate scientific progress, and this idea is still connected to the current work on advanced reasoning capabilities (3m10s).
The development of advanced reasoning capabilities is crucial for AI to be able to do science and accelerate technological progress (3m32s).
The paper on O1, a model developed by Open AI, highlights its capabilities and potential for the future, including its ability to perform well in chip design (3m49s).
The ability of AI to design chips better than humans could potentially eliminate one of its bottlenecks for getting greater intelligence (4m1s).
The current progress in AI development, including the capabilities of O1, suggests that we are on the pathway to achieving AGI (4m10s).

YC o1 hackathon (4m18s)

Diode Computer is a company that builds AI designers for circuit design, and their previous product could handle PCB design, which involves four major steps: system design, component selection, schematic design, and layout and routing (4m37s).
The company's previous product could automate schematic design and, to some extent, routing, but it was not able to handle system design and component selection (5m34s).
The company has now demonstrated a new product, called 01, which can automate system design and component selection, allowing it to read data sheets and select the right components for a specific project (5m50s).
The 01 product can take high-level constraints, such as building a wearable heart rate monitor, and match the specific components needed, including a microcontroller, accelerometer, and heart rate monitor sensor (6m7s).
The product can then output a system diagram and generate code in a language called Arile, which can be used to build a PCB (6m57s).
The output of the 01 product can be used to generate a layout for the board, which can then be fine-tuned and used to create a fully working printed circuit board (7m12s).
The company's system can also call an auto-router on the specific board, allowing for the creation of a fully working PCB (7m41s).
The 01 product goes beyond the traditional EDA (Electronic Design Automation) process, which involves design, simulation, and bug verification, by automating the entire process from system design to component selection and layout (7m58s).
A paper used different models for different tasks and workflows, such as 40 mini for PDF extraction and 01 for reasoning, to select the correct components for a circuit board before placing it on the board, which is a common pattern in building interesting products with AI (8m26s).
The process of selecting components, such as servos, motors, and sensors, requires a lot of thinking and is a hard task for humans, making it a suitable application for AI models like 01 (9m12s).
Diode, a company, tried to use GPT-40 for component selection but failed, and then successfully used 01 for the same task, demonstrating a step-function capability unlock (9m25s).
A hackathon organized by Diana featured actual YC-funded startups building features for their products using 01, showcasing how the model can unlock capabilities for real companies (9m52s).
Camper, a company, uses 01 to create cat designs with natural language input, allowing users to design complex systems like air foils optimized for specific conditions without requiring extensive mechanical engineering knowledge (10m34s).
Camper's system can run multiple simulations simultaneously and solve partial differential equations, making it a co-pilot for solid works and allowing users to design complex systems with ease (11m13s).
The system can even write and solve equations, such as the Navier-Stokes equations, to solve air foil design problems, demonstrating the capabilities of 01 in reasoning and problem-solving (11m39s).
Sam wants to increase the magnitude by four orders to reach a trillion-dollar spend, which is considered a significant and ambitious goal (12m0s).

4 orders of magnitude (12m9s)

Abstracting complex concepts, such as understanding the nature of physics, could be possible if scaling laws hold, making it plausible to tackle difficult engineering challenges like room temperature fusion, weather prediction, and complex physical phenomena (12m21s).
These complex physical phenomena are hard to solve and typically require PhDs, but advancements in AI, particularly with Chain of Thought and reasoning, could lead to breakthroughs in these areas (12m57s).
The concept of providing feedback not just on outputs, but on the steps to get there, is a key idea in teaching models how to think, allowing for fine-tuning of various steps to ensure the model thinks as desired (13m21s).
This approach is similar to the AGI conversations, which focus on teaching models to think better, rather than just producing correct answers, and is made possible by the scaling laws, which provide more surface area for throwing compute at the problem (13m47s).
The scaling laws enable iterative improvement of results by spending more money and time, similar to what might be expected from a human scientific organization, and potentially even more consistently (14m10s).
The architecture of the AI model is inspired by previous work, including the beginning of OpenI, and has been developed over many years (14m34s).

The architecture of o1 (14m42s)

The AI model that won video game competitions, specifically DOTA, was a breakthrough in the tech industry, showcasing the power of reinforcement learning techniques, which were also inspired by Alpha Go and Alpha Zero (14m42s).
DOTA is a complex game that requires resources and planning, and the AI model's success was due to its ability to learn from playing against itself a million times, using Q-learning as the fundamental algorithm behind reinforcement learning (14m42s).
The connection between DOTA and GPT-type models lies in incorporating reinforcement learning into the generative model, which requires a large amount of factually correct data and a reward function to reason about the output (15m57s).
The training process likely involved interesting techniques, including the use of secret data sources, such as math and science problems, to improve the model's performance (16m20s).
There are two research directions being explored in parallel: scaling up the underlying language model (LM) and using reinforcement learning to unlock the model's potential in the real world (17m16s).
The 01 model, which uses reinforcement learning, is a significant step forward, and its full version is expected to be a huge improvement over the 01 preview, with 02 and 03 models not far behind (17m35s).
The 01 model is still opaque, and its development required creating a new dataset to train the chain of thoughts, which was a costly endeavor (18m0s).
Large language models (LLMs) can be improved by breaking down tasks into steps and using evaluation sets, as discovered by Jake Heler for case text, which is also applicable to other tasks (18m29s).
The prescription for improving LLMs involves two parts: breaking down tasks into steps and using evaluation sets, with the latter being crucial even if the model can break down tasks on its own (18m52s).
Some companies have achieved 100% success by following Jake Heler's recommendations, which include having a large evaluation set and carefully testing every step of the reasoning pipeline (19m28s).
The key to future AI breakthroughs may lie in reasoning, not scaling, with a focus on creating large evaluation sets and proprietary data that is not readily available online (19m42s).
The value of a company's AI model (Moe) may ultimately lie in its ability to access and utilize proprietary data that is not publicly available, which can be achieved through enterprise sales and partnerships (20m25s).
Startups may benefit from targeting customers who are willing to pay for high accuracy and perfection, such as those in industries that require precise and specialized knowledge (21m41s).
Companies like Camper may be a good example of this approach, as they focus on providing high-quality and specialized products that require precise and accurate information (21m51s).
The next AI breakthroughs may come from companies that are willing to do the hard work of collecting and utilizing proprietary data, rather than relying solely on publicly available information (21m36s).

Getting that final 10-15% of accuracy (21m52s)

There is a growing interest in text-to-CAD design, particularly among hobbyists and those who want to quickly prototype and test their ideas, but also among professionals who require high accuracy and precision, such as those designing airplane parts (21m53s).
The strongest technical teams will have the option to go all the way and cater to customers who demand 100% accuracy and are willing to pay a premium for it (22m20s).
The use of AI in design and prototyping may not commoditize technology and make it less important to have a strong technical team, but rather the opposite, as the value will likely be captured by the strongest technical teams who can build on top of existing technology and achieve the final 10% of accuracy (22m49s).
The key differentiators for companies using AI in design and prototyping will be the prompts, evaluations, UI layer, and integrations, as simply having good prompts is not enough for a company to adopt the technology (23m2s).
Distribution, branding, and difficulty in switching will also be important factors in the success of companies using AI in design and prototyping (23m25s).
The classic moats of software still apply, and companies that can establish a strong brand and make it difficult for customers to switch will have an advantage (23m45s).
Evaluations will still be crucial in the world of AI, as founders will need to figure out how to build the best product on top of the technology (23m56s).
Gigl, a company that was funded for a different idea, has pivoted to helping companies fine-tune open-source models to achieve equivalent performance (25m17s).
Open AI was initially the primary focus, but it was found that such businesses are not great due to decreasing model costs and increasing performance of open-source models, making fine-tuning less necessary (25m30s).
Companies like AO pivoted to finding vertical applications for their AI expertise, such as AI customer support, which is a competitive space but allows for squeezing out a comparative edge (26m1s).
AI customer support deals with many edge cases and squishy problems, making it challenging, but intensely technical teams can still find ways to gain an edge (26m34s).
Despite the potential, hardly any adoption of AI customer support has happened yet, and the space remains wide open (26m52s).
Rules-based systems work well for simple cases, and there's a lack of trust in AI's ability to solve complex problems, which contributes to the slow adoption (27m10s).
Companies like Zepto have started to adopt AI customer support, with Zepto automating 30,000 tickets per day and having over 1,000 people working on those tickets previously (27m50s).
The automation of customer support jobs, although potentially replacing human jobs, can also free people from rote and unfulfilling work, allowing them to pursue more meaningful careers (28m20s).
A previous implementation of a model had a 70% error rate, but after using a technique described by Jake Keer, the error rate was reduced to 5%, a significant improvement of an order of magnitude (29m29s).
This improvement is particularly notable for complex, time-consuming, and expensive problems that were previously unsolvable, with the model now achieving 85% accuracy, up from 0% (30m9s).
The model in question is 01 Preview, a new technique that is still being developed and refined, with the company trying to protect its advantage by hiding the actual model and using a fake one to give the impression of breaking down problems into steps (30m38s).
The next step for 01 is expected to be the addition of interpretability and direct ability, allowing users to see the steps and edit them, which would be a significant unlock for the model's capabilities (30m56s).
Currently, 01 can output a chain of thought, but it cannot be edited, and the ability to edit each step would take the model to the next level of fine-tuning (31m26s).
The current state of models like 01 is the worst they will ever be, with rapid improvements being made week to week, and new capabilities emerging that were not possible just a month ago (31m39s).
The improvement in models like 01 is expected to have a significant impact on various companies and ideas, with some benefiting greatly from the uplift, while others may not be as affected (31m56s).
The opposite kinds of ideas that may not benefit as much from 01 are not explicitly stated, but it is implied that companies and ideas that rely on complex, time-consuming, and expensive problems may be less affected by the model's improvements (32m5s).

The companies/ideas that should pivot because of o1 (32m6s)

Companies building AI coding agents or AI program engineers may need to reassess their strategies due to the advancements in o1, which is outperforming in solving programming problems (32m31s).
The Chain of Thought infrastructure, which some teams have heavily invested in, may not be a significant leap forward with o1, as it is already directable (32m45s).
The opaque nature of the Chain of Thought and the difficulty in altering its path once it starts are current challenges for users and systems (33m2s).
New model capabilities can unlock new startup ideas, and the recent advancements in AI models have made phone calling-related startups successful (33m29s).
The o1 series of models may enable new startup ideas that can improve the physical world, particularly in fields like mechanical engineering, electrical engineering, chemical engineering, and bioengineering (33m54s).
These advancements could lead to real-world abundance and improvements in people's lives, rather than just minor conveniences (34m29s).
There may be a sense of urgency to develop and apply these technologies, as there is a fear of AI in society, and it is up to the developers to create positive change (34m38s).

Outro (34m44s)

The goal is for a technologist to help bring about an age of abundance sooner rather than later (34m46s).
Achieving this goal could lead to abundance winning out over fear (34m52s).
This concludes the current episode of "The Light Cone" with the next episode to follow (35m1s).