Building AI Models Faster And Cheaper Than You Think

29 Mar 2024 (9 months ago)
Building AI Models Faster And Cheaper Than You Think

Coming Up (0s)

  • Recent advancements in AI have made many science fiction concepts a reality.
  • Generative AI models like GPT-4, Midjourney, and now Sora are pushing the boundaries of what's possible.
  • YC companies are building foundation models during the batch with just $500,000.
  • These models are being developed by young college graduates in a relatively short time frame.
  • This demonstrates that it's possible to be on the cutting edge of AI research without significant resources.

Sora Videos (1m13s)

  • Sora's video showcases a humanoid robot walking a golden retriever on a suburban street.
  • The video demonstrates significant improvements in text generation, with the model accurately spelling out "help" and producing high-definition images.
  • The physics of the robot's and dog's movements are mostly accurate, capturing the lifelike gait of a golden retriever.
  • The prompt was followed precisely, although minor imperfections were noted, such as a floating dog and inconsistencies in the street and structures.
  • Sora's videos exhibit long-term visual consistency, maintaining a consistent architectural style and environment throughout the minute-long clip.
  • The drone camera circles the Golden Gate Bridge, showcasing stunning views of the cliffs, ocean waves, and San Francisco in the background.
  • The high definition of the video is impressive, capturing intricate details of the bridge and city.
  • Geographical accuracy is not perfect, with terrain and city layout differing from the real world.
  • Minor imperfections include disjointed bridge columns at certain angles and cars driving on the wrong side of the road.
  • Simulating fluid motion remains a challenge, resulting in slightly static waves.

How Sora works under the hood? (5m5s)

  • Sora combines a transformer model, typically used for text, with a diffusion model, used in image generation like DALL-E and Midjourney.
  • It adds a temporal component to ensure consistency between frames and time.
  • Sora is trained with videos and "SpaceTime patches," which are 3x3 matrices of pixels that include spatial and temporal information.
  • The size of these patches can vary, and they are trained in a large architecture.
  • SpaceTime patches are the video equivalent of tokens, building on prior work in transformer models for images and robotics.

How expensive is it to generate videos vs. texts? (8m19s)

  • Generating videos is more computationally expensive than generating text due to the additional dimension of time.
  • GPT-4 has a trillion parameters and operates in two dimensions, while videos require an order of magnitude more parameters, likely around 10 trillion.
  • It likely requires 10 times the number of GPUs used for GPT-4, which was around 20,000-30,000 GPUs.
  • Some YC companies have achieved similar functionality with fewer resources by optimizing data, compute, and expertise.

Infinity AI (10m1s)

  • Makes deep fake videos of a particular person.
  • Trained their model on the first three episodes of the Lite cone podcast.
  • Only needed an hour or so of YouTube video to get an accurate representation.

Sync Labs (11m23s)

  • API for creating real-time lip-syncing.
  • Trained the models on a single A100 GPU.
  • Compressed a lot of the data and used low-resolution video to reduce the amount of data needed.
  • Partnered with Aure to get access to a dedicated GPU cluster, allowing them to iterate 100 times faster.
  • YC companies get over half a million in credits and instant access to a GPU cluster within 24 hours.
  • The companies in the YC batch didn't have to use any of the YC money to train their models.

Sonauto (13m41s)

  • Sonauto is a company that has built a text-to-song model.
  • The model can generate songs based on given lyrics and the specified singer.
  • The founders of Sonauto are 21 years old and built the model in months by teaching themselves.
  • The generated songs have understandable lyrics and sound like they are sung by a person.

Metalware (15m44s)

  • Metalware is a company that is building a co-pilot for hardware design.
  • The founders of Metalware had a background in hardware engineering but not in AI.
  • They trained a foundation model for hardware design during the batch without much AI expertise.
  • Metalware used high-quality data from textbooks and a smaller model (GPT-2.5) to reduce computational resources.
  • By constraining tasks, using high-quality data, and choosing a smaller model, Metalware was able to build a foundation model for various applications beyond just generating video or text.

Guide Labs (17m40s)

  • Building an explainable foundation model to understand how the model makes predictions.
  • The team is training a model to determine when it's better to invest in building a custom model or fine-tuning an open-source model.
  • Expertise in AI might be overrated, as smart individuals who are willing to read research papers can achieve similar results.
  • YC can provide credits to offset some of the compute costs.
  • The key differentiator lies in finding high-quality data, even if it's not a giant dataset.

Phind (19m29s)

  • Phind is a company that created a co-pilot for software.
  • They used synthetic data from programming competitions to train their model.
  • Synthetic data was initially controversial because it seemed like a model couldn't generate its own data and learn from it.
  • However, it works because LLMs are capable of reasoning, which allows them to generate data and improve their own models.
  • Other generative AI models, like self-driving car models, are also trained on massive amounts of simulation data.
  • Sora is an AI model that can generate videos.
  • It uses video footage generated from game engines like Unreal Engine or Unity, which have full physics simulators.
  • This allows Sora to generate videos from multiple camera angles and simulate the real world.
  • The implications of this technology go beyond entertainment, as it can be used for weather prediction, scientific simulations, and more.

Diffuse Bio (24m21s)

  • Diffuse Bio applies foundation models to biology to create new molecules for drugs and gene therapies.
  • The founder has expertise in biology and published papers in Nature.
  • Custom kernels were built to speed up the model training process, reducing resource requirements.

Piramidal (25m36s)

  • Piramidal builds a foundation model for the human brain to predict EEG signals.
  • EEG signals are similar to videos, representing electrical impulses over time.
  • Chunking the data into spacetime chunks reduced the runtime complexity quadratically.
  • The model can be trained with just 800 hours of GPU compute.
  • EEG data is an unexpected application area for foundation models.

K-Scale Labs (27m15s)

  • K-Scale Labs is developing consumer humanoid robots.
  • The founder previously built the foundation robotics model for Tesla and integrated it into the Optimus Prime robot.
  • Advances in foundation models, such as the physics simulator for the world, are enabling breakthroughs in robotics.

DraftAid (28m58s)

  • DraftAid is building AI models for CAD design.
  • Traditional CAD software uses old kernels that run on Fortran and are expensive to use.
  • DraftAid is using AI models to replace some of these kernels, making the process faster and cheaper.

Playground (30m38s)

  • Playground is a YC company that has developed an AI model that can generate images.
  • The model is open-source and outperforms Stable Diffusion in many cases.
  • Playground was able to achieve this on far less money than Stability AI and other teams in the space.
  • Suil Doshi, the founder of Playground, taught himself AI in a month by reading papers and meeting with experts in the field.
  • This highlights the fact that the AI field is still new and that it is possible to become an expert in a relatively short amount of time.
  • Companies can compete with OpenAI and other large AI companies by training their own models for specific verticals and use cases.

Outro (33m20s)

  • There are many incredible things being done in AI by people who are likely not that different from the viewers.
  • Many notable figures in AI, such as Sam Altman and Dario Amade, started somewhere, and YC could be the starting point for aspiring individuals.

Overwhelmed by Endless Content?