Perplexity CEO Aravind Srinivas on the rush toward an AI-curated web | TechCrunch Disrupt 2024
31 Oct 2024 (2 months ago)
Plagiarism and Attribution in Perplexity
- Perplexity's definition of plagiarism is not explicitly stated, but the company emphasizes the importance of citing sources and providing references, similar to academic and journalistic practices, to avoid any claims of ownership of content and to surface information from the web in a digestible manner (17s).
- The company's approach to avoiding plagiarism involves summarizing content, providing sources, and including citations, similar to how journalists, academics, and students write essays with bibliographies and citation sections (1m3s).
- In cases where news outlets report on the same news story, mentioning the original source is not considered plagiarism, and Perplexity follows a similar approach by citing sources and providing footnotes (1m37s).
- A study by CopyLeaks found that Perplexity often uses 8-15 words in a row from an article and cites the article in a source panel or footnote, although the exact footnote may not always be associated with every sentence (1m51s).
Perplexity's Launch and Approach to Citations
- Perplexity launched on December 7, 2022, with a focus on references and citations, setting it apart from other AI products like ChatGBT, which did not provide referencing at the time (2m20s).
- The company has improved its approach to citations and referencing over time, aiming to make it clearer within the body of the text which parts are coming from specific sources (2m53s).
- Perplexity's founder, Aravind Srinivas, comes from an academic background and genuinely believes in the importance of citations and referencing, which is reflected in the company's approach (3m3s).
Preventing Plagiarism in AI Models
- To avoid plagiarism, Perplexity needs to define what plagiarism is and have an overseer model that can detect and prevent copying and pasting of content without proper citation (3m39s).
- AI models are instructed to provide summaries of content without reproducing exact text from any particular source, taking in different perspectives and directly getting to the user's point, with this instruction being performed well as models improve at instruction following through supervised fine-tuning and reinforcement learning from human feedback (3m49s).
- No model is perfect, and it's possible to prompt engineer or prompt inject any model to make it do things differently from its original prompt, which is called prompt injection (4m34s).
- The product is meant to answer questions, but some people try to use it to summarize articles by pasting URLs, which is not the intended use case and is being avoided (4m54s).
- It's challenging to have a guard rail against every particular use case, and it's possible to figure out new ways to prompt engineer things to bypass the original guard rail (5m18s).
AI-Powered Search and Google's Approach
- The concept of AI-powered, AI-driven, AI-native search has been explored by several companies, including Google, which has been complicating and shidifying search for over a decade by adding summary widgets and AI (5m45s).
- Google's search engine is fundamentally link-based, and the company makes most of its money from search ads, which incentivizes them to show users as many links as possible (6m26s).
- The primary use case for Google's search engine is not AI summaries, but rather showing links, as most queries don't have AI summaries, and if they did, it would disrupt the revenue model (7m0s).
- The goal is to create a search experience that is not focused on showing links, unlike Google's homepage, which is designed to encourage users to click on links (7m18s).
Perplexity's Search Focus and User Queries
- The median number of words in a Google query is around 2-3, whereas in Perplexity, it's around 10-11 words, indicating that users are more likely to ask direct questions on Perplexity (7m22s).
- Google is often used for instant information, such as the age of a celebrity, live scores, or the weather, with users typing in only a few keywords (7m49s).
- Perplexity aims to serve a different use case, where users can ask more in-depth questions and receive detailed answers (8m10s).
Sports Features and Data Acquisition
- Perplexity has recently added features such as match scores, but only for the NFL, using a data contract with a data provider rather than scraping the information (8m30s).
- The addition of sports-related features is intended to make Perplexity a place where users can ask anything and receive accurate answers (9m11s).
- Initially, Perplexity's user base consisted of scholarly and academic research-oriented individuals, but the goal is to expand to a broader audience (9m37s).
- Sports is an area where Perplexity can offer more than just live scores, including commentary summaries, player and team comparisons, and fine-grained details (10m8s).
New Features and Product Strategy
- Perplexity is releasing a dozen or so new features, but it's unclear whether there's a specific strategy behind this or if it's a "shotgun approach" to see what sticks (10m32s).
- The goal is to find products that are valuable to users by looking at logs of what people are asking for, and providing more than just a wall of text, with a focus on verticals such as finance, sports, and local searches (10m45s).
- Priorities are decided based on user habits and daily use cases, with the aim of covering a range of topics to make it unnecessary for users to go back to traditional search engines (11m37s).
- The company is working to earn user trust by doing the "hard work" of covering daily use cases beyond just helping with software development, research planning, and academic research (12m8s).
- Local searches, such as sports, weather, shopping, and travel, are also a priority (12m21s).
Dow Jones Lawsuit and Publisher Program
- A lawsuit was filed by Dow Jones, which claimed that the company was a "content kleptocracy", but the response was that media companies wish this technology didn't exist because they didn't like the arrangement being offered (12m29s).
- The company claims to have responded to the lawsuit the same day and is open to collaborating and working with publishers (12m58s).
- A program called the Perplexity Publisher Program was launched to address revenue sharing and licensing content, with the goal of finding a mutually beneficial arrangement (13m23s).
Revenue Sharing and Publisher Collaboration
- The program aims to separate out AI companies into two types: those that train large foundation models on all internet data, and those that use content from the web as sources in real-time (13m40s).
- The company is trying to explain to publishers the benefits of working together and finding a revenue-sharing model that works for both parties (14m18s).
- There are two types of companies, and the mindset of the first type, where media companies get paid for licensing their content to train models, does not apply to the second type, which includes companies like Perplexity (14m19s).
- Perplexity proposes a different structure, where revenue is made through advertising, and this revenue will be shared with publishers on a query level basis if their source is cited as part of the answer (14m47s).
- Unlike previous search engines that made a ton of advertising revenue without sharing it with publishers, Perplexity will share its revenue with publishers who choose to work with them (15m7s).
- The growth of Perplexity and effective monetization through ads can be rewarding for publishers who allow their content to be cited as part of Perplexity's answers (15m29s).
Publisher Concerns and Perplexity's Response
- Publishers may see Perplexity as a threat because when their content is cited in a Perplexity summary, users do not click through to the original article, resulting in lost traffic and revenue (15m51s).
- Perplexity disagrees with the assertion that they are in competition with news products, as users do not come to Perplexity to consume daily news, but rather to make sense of the news and its impact on their lives (16m15s).
- Perplexity is not a news product, and users come to the platform to ask questions about how news affects them, such as whether to buy a particular stock, and then go directly to the source for more information (16m41s).
- Perplexity aims to educate people on the differences between their product and news products, and show how users are using their platform in a way that is distinct from reading news articles (17m7s).
Perplexity's Publisher Program and Benefits
- A publisher program is available for everyone to join, offering ad revenue sharing and assistance in building AI-native assistants for users to search content on their platform (17m55s).
- The program provides APIs with a lot of API credits and a premium product, Perplexity subscription, Enterprise plan for all media company employees to help create content more effectively (18m27s).
- The product can aid in fact-checking, research, and streamlining the research process, making it a useful tool for media companies (18m59s).
Copyright, Lawsuits, and IP Law
- The current world of copyright and lawsuits is complicated and unprecedented, with no clear presence in IP law, making it understandable that mistakes can be made (19m28s).
- The possibility of unintentionally committing a crime due to the complexity of IP law is acknowledged, and the company will defend itself in any lawsuits (19m46s).
- The laws may need to change to allow for more freedom in IP movement, but it's a complex issue with existing laws and precedents around copyright (20m1s).
- The belief is that facts should be universally distributed, and ownership over facts should not restrict the dissemination of knowledge and truth (20m36s).
AI Development Costs and Funding
- Raising a significant amount of money is necessary due to the high expenses associated with AI development and providing facts (21m3s).
- The current high costs of using AI models are due to the expensive nature of GPUs and data centers, which leads to inference costs that companies need to pay to use these models (21m13s).
- However, the cost of using AI models is dropping rapidly, with a trend of roughly 2x reduction every 4-5 months, which could lead to a 10-50x reduction in model costs over the next year or two (21m35s).
- This cost reduction is beneficial for companies that are growing and can focus on scaling up their operations while figuring out a long-term sustainable revenue model (21m53s).
Monetization and Revenue Models
- There are various ways to monetize AI products, including subscriptions, but also other methods that can be explored, such as usage-based monetization (22m16s).
- The company is working on figuring out a revenue model that can take revenue away from Google, but it's unclear if people will switch to a new platform or if it will hurt the advertising revenue of other companies (22m46s).
- When a new platform emerges, the focus should be on gaining the trust of users and advertisers, rather than over-optimizing for ad revenue, and addressing concerns such as brand risk and hallucinations (23m33s).
Acquisition Offers and Company Interactions
- The company has not received acquisition offers from OpenAI, Microsoft, Google, Amazon, or Notion, but has had some interaction with Meta (24m11s).
- The company is dealing with its own issues related to immigration, which was recently tweeted about (24m48s).
Immigration Challenges and Green Card Process
- Running a billion-dollar company does not guarantee a smooth process for obtaining a green card, as the process can be difficult for everyone, regardless of their position or wealth (24m56s).
- The American Immigration system has per-country caps that limit the number of people allowed to become permanent residents each year, contributing to backlogs and slow processing times (25m14s).
- The large number of people from India trying to obtain a green card, combined with existing backlogs, is a primary reason for the slow process (25m28s).
- There is widespread agreement that the immigration system can be improved, and raising awareness about its issues may encourage people to take note and work towards positive change (25m47s).
- The intention behind discussing the challenges of obtaining a green card is to bring attention to the need for improvement in the system, rather than seeking special treatment (25m57s).