Making human music in an AI world | The Vergecast

17 Nov 2024 (4 months ago)

Intro (0s)

The Vergecast is the flagship podcast of The Verge, and the host is David Pierce, who is currently at Union Station in Washington DC, on his way to New York for planning meetings with other Verge staff members (10s).
David Pierce has been working at The Verge for 2 and a half years and is going to meet some coworkers in person for the first time due to the post-pandemic universe (19s).
This episode is the third and last in a miniseries about the future of music, covering topics such as Trackstar, TikTok, and Autotune (33s).
The guest for this episode is Stanford Professor Ge Wang, who has a background in the music world, academia, and entrepreneurship, having built a company called Smule (38s).
Ge Wang has been involved in the music world for a long time and has been thinking about the implications of AI and virtual reality on music creation (53s).
The conversation with Ge Wang covers AI, virtual reality, and what it means to make music in a world where technology plays a significant role in the process (1m2s).
The episode is supported by Nissan Kicks (1m22s).

Ge Wang's Background and Contributions (1m28s)

Ge Wang co-founded a company called Smule, which is still around and makes popular music-based apps, including a karaoke app and Magic Piano, a piano-playing game similar to Guitar Hero or Beat Saber (2m17s).
Ge Wang also created a programming language called ChucK, which allows users to write code that outputs as music (2m37s).
Ge Wang is a professor at Stanford, teaching in the Center for Computer Research in Music and Acoustics, and has written a book about design (2m46s).
Ge Wang has been teaching about music and technology for years and is also the conductor of Stanford's Laptop Orchestra, a group of students who use laptops and gadgets to create unique music (3m1s).
The Laptop Orchestra creates music that is often unconventional and experimental, with Ge Wang himself sometimes performing using a glove connected to his laptop to change the sound as he moves his hand (3m25s).

Exploring Creativity and Technology (3m36s)

Gu's concerts in space are unique and worth watching, and he has a distinct perspective on the future of music due to his background in music, programming, and engineering (3m38s).
Gu's conversation about the future of music took an unexpected turn, focusing on what it means to be creative in a time when everything is being optimized and simplified (4m11s).
Gu has a fascination with computers that started when he was a small child, sparked by video games, and he still remembers the first time he saw a video game in an arcade in Beijing (5m6s).
Gu's version of computer music is unique, and he has been obsessed with it since he was a child; he makes music using computers, but his approach is distinct from the typical use of computers in music production (5m2s).
Gu has a degree in computer science and a PhD in the same field, but he considers himself a strange computer scientist because he builds things that solve no existing problems and are not necessarily asked for (6m19s).
Gu created an app called Ocarina, which allows users to play music by blowing into the iPhone's microphone and using multitouch to control the pitch, with the tilt of the phone controlling the vibrato (6m43s).
Gu demonstrated Ocarina by playing a short melody, showcasing its unique features and physical interaction (7m23s).
The app Ocarina was created using a combination of software engineering, software design, interaction design, and signal processing, with all the sound generated live on the phone using the programming language Chuck (7m43s).
Chuck is a programming language for music synthesis that allows users to write code to generate sound and create music through either algorithmic or human interaction methods (8m20s).
The app Ocarina utilizes human interaction, specifically physical interactions, to control the sound generated, often combining automation with human input to create an interesting musical experience (8m43s).
The development of Ocarina was not driven by a practical need or problem to be solved, but rather by a creative desire to build something new and unique (9m11s).
The process of creating technology without a specific practical use or utility can be a different and interesting kind of creative process, one that involves thinking about what can be made rather than what problem needs to be solved (10m21s).
The idea of critical tool building involves questioning why a particular technology is being created, and considering the motivations and goals behind its development (10m53s).
The creator of Ocarina, who also teaches and writes about this topic, has refined the process of building technology in this way, and emphasizes the importance of considering the "why" behind a project (10m36s).
The book "Artful Design: Technology in Search of the Sublime" explores this idea of critical tool building and the creative process involved in building technology without a specific practical use or utility (10m40s).
The problem with current design is that it is not need-based, but rather value space design, where the focus is on something the designer deeply believes in, rather than a clear practical need (11m30s).
Value space design is driven by core values, such as the belief that music making is good for a person and has inherent value, regardless of the end product (12m7s).
The process of playing and learning an instrument can be satisfying and gratifying, much like playing a well-made video game, as it allows for self-expression and the overcoming of challenges (12m43s).
However, in today's world of convenience and efficiency, the idea of doing something for the joy and pleasure of it, rather than for a practical purpose, feels anachronistic (13m14s).
This approach to design and creativity is seen as out of time and not aligned with the optimization-driven and competition-driven society we live in (13m36s).
Despite this, it is believed that building things that are playful and interesting for their own sake is what makes us human and is essential to our identity (13m55s).

AI and the Future of Music (13m59s)

People often pursue activities that bring them joy and make them feel like themselves, even if they're not practical or optimized, and these activities can be considered passionate hobbies (14m6s).
Engaging in activities for their own sake, rather than for optimization or convenience, can bring a sense of fulfillment and connection to one's identity (14m29s).
Examples of such activities include cooking for personal enjoyment, where the process of creating something can bring joy, regardless of the outcome (14m47s).
In a world where tools are primarily designed for optimization and convenience, there's a risk that people may become alienated from their true selves (15m27s).
The development of AI-generated music and art raises questions about the value and meaning of creative work (15m38s).
AI can generate impressive music and images, but the ease of creation may lead to a lack of attachment or investment in the final product (16m17s).
The "bubble gum effect" refers to the fleeting nature of interest in AI-generated creations, which can be exciting at first but quickly lose appeal and become disposable (16m32s).
The ease of creating content with AI may lead to a focus on sharing and social functions, rather than meaningful engagement with the creative process (17m45s).
The concept of "bubble gamification" refers to the push for automation to reduce labor costs to zero, making things easily consumable and disposable, much like chewing a piece of gum and then discarding it (17m59s).
The process and provenance of creative work, including the story behind it and the person who made it, are essential and connected to the thing itself, even if it's hard to quantify (18m30s).
The use of AI in creative work raises questions about what we want from AI, whether it's to make things that sound like existing artists or to create something entirely new and different (19m2s).
AI can be used as a tool to create new and unique sounds, rather than just replicating existing ones, and this approach can lead to the creation of something different and valuable (19m23s).
The idea of "pluralism" is essential, allowing for a diversity of values, aesthetics, and social norms, and providing room for humans to flourish and explore new ideas (20m23s).
Exploring the use of AI to create new and untapped sounds is important, but it's also crucial to consider the meaning and value we ascribe to these creations, which can be considered art (20m56s).
The concept of a "civil society" requires the capacity for pluralism, allowing for the coexistence of different values and aesthetics, and enabling humans to flourish (20m38s).
The idea of computer music involves finding new ways to interact with technology and computers, such as using a glove as a new way of using a computer, which is a more explicit, playful, and exploratory approach to technology (22m11s).
This approach is different from trying to figure out how to use an app effectively to get work done, and it involves a shift from learning how to use technology to figuring out what's possible with it (22m39s).
Teaching students to "play" with AI involves building interactive tools with AI that can be deployed into everyday life, often involving human interaction, such as building instruments that track hand movements to generate sound (23m15s).
These interactive tools require learning how to use them, and they can be playful and interactive, such as an instrument that changes pitch based on hand movements (23m31s).
Another example of a playful AI tool is "Autois," a computer vision system that plays cheesy seductive music when it detects a seductive look (24m8s).
The definition of play is something that is not about the outcome, but about the process, and it's the opposite of work, which requires a productive outcome (24m45s).
There's a difference between thinking about technologies like AI as tools versus thinking about them as instruments, and AI is often talked about as a tool, but it can also be thought of as an instrument with a set of capabilities that can be explored and combined in new ways (25m19s).
Figuring out what to do with AI's capabilities and how they mix together can lead to new and unexpected combinations that turn into something interesting (25m50s).
The concept of a Venn diagram is used to illustrate the intersection of human capabilities and AI capabilities, with the intersection representing tasks that both humans and AI can do well, such as making music or playing Ping Pong (26m9s).
The current approach to AI development is often focused on replicating human tasks, with progress measured by how indistinguishable AI-generated output is from human-generated output, a concept referred to as the "Turing trap" (27m6s).
This approach overlooks the potential for AI to excel in areas that do not intersect with human capabilities, and exploring this uncharted territory requires imagination, play, and creativity (27m32s).
Removing the pressure to be useful or competitive can lead to more innovative and expressive AI-generated output, as seen in courses where students are encouraged to explore the unexplored space of AI capabilities (27m59s).
The goal is to achieve a beneficial amalgamation of human and AI capabilities, where AI does things that humans are not doing, and humans do things that AI is not doing, rather than simply overlapping or colliding (28m31s).
This approach keeps human curation, intention, and wisdom in the loop, ensuring that the creative process is not solely driven by AI, but rather a collaborative effort between humans and machines (28m54s).
The role of art is to help humans understand their emotions, and the provenance of a piece of art, including its story and context, is essential to its value and meaning (29m17s).
The importance of provenance is fundamental, as it provides insight into what the artist is trying to communicate, and this is particularly relevant when AI is involved in the creative process (29m46s).
Art can be seen as a lens through which people view the world, themselves, and their emotions, and good art invites the experiencer to see things from a new perspective, making it a human-centric process (29m51s).
A good piece of art, regardless of the medium, serves as a lens to understand and interpret the world, requiring human involvement and emotions (30m19s).
The human element is essential in art, as it allows for the expression and understanding of emotions, which is something that AI may struggle to replicate (30m38s).
AI-generated art may undermine the human-centric nature of art, as it is based on the lowest common denominator of its training data, resulting in a homogenized output that lacks originality (30m50s).
The vast amounts of information available to AI may actually hinder its ability to create great new art, instead producing average or unoriginal work that is a combination of existing pieces (31m27s).
Having access to every song ever made, for example, may not be beneficial for creating new and innovative music, as it may lead to the creation of songs that are simply an average of existing ones (31m31s).

Philosophical Reflections (31m45s)

Artists generally do not want AI to replace them entirely, but rather to assist with certain parts of their work, as the core creative process is what gives the activity meaning and value (31m57s).
The prevalent thinking in AI is focused on optimization and outperforming humans without considering the potential consequences or what is truly desired from AI (32m39s).
The use of AI in tasks such as navigating government bureaucracy and filling out forms is generally seen as a positive application of technology, as it can make these processes more efficient (34m27s).
There is a "messy middle" where tasks, such as writing emails, have a meaningful human element, but AI tools are being developed to automate these tasks, raising questions about the role of humans in these processes (34m52s).
The spectrum of tasks can be divided into three categories: art, life, and nonsense, with people being happy to give up nonsense tasks to technology, reluctant to give up art, and case-by-case for life tasks (35m14s).
The challenge lies in drawing the lines between these categories and deciding where it is worth outsourcing and embracing AI, and where it is not (35m23s).
The concept of protecting valuable things in a spectrum of human creation is discussed, with the example of email being used to illustrate the point (35m35s).
The first experience with email is shared, dating back to 1996 when it was seen as a convenient and fun way to communicate, but now it is a necessary tool for work (35m41s).
The convenience of email has led to an increase in work hours, rather than a decrease, as people have more time to do more work (36m47s).
Studies have shown that increased efficiencies in technology have not led to a reduction in work hours, but rather an increase in workload (37m5s).
The analogy of email is applied to AI, with the suggestion that if AI saves time on tasks like email, it may not lead to more free time, but rather more time to do more work (37m30s).
The possibility of AI replacing human writers, such as scriptwriters for TV shows, is discussed, with the concern that it could lead to a loss of jobs and creativity (38m12s).
The traditional writing room for a TV show, where writers would collaborate with the production team and actors, is contrasted with the potential future where AI-generated writing is used, and human writers are no longer needed (38m29s).
The concern is raised that AI-generated writing could lead to a loss of imagination and creativity in the writing process (38m48s).
The potential for AI to replace human writers is seen as a threat to the creative industry, with the possibility of one person being able to do the work of many writers (39m11s).
The output of AI systems might not be good, but it can be good enough to make entertainment that people will pay for, resulting in a huge loss of livelihood and cultural loss in terms of the kind of art being made (39m18s).
The fear is that humans will be replaced by machines that produce generic and less interesting content, which is acceptable to those in power because it can make them a lot of money (41m29s).
The fear is not just about livelihood loss, but also about the loss of unique and interesting art in favor of something more generic and less interesting (41m46s).
The use of AI tools could lead to a decrease in standards, making people not care about the quality of art anymore, and potentially leaving no room for people to pursue unique and difficult art (41m59s).
The approach to these AI tools could be to learn to use them as an instrument to expand the possibilities of art, rather than pulling everything down to a lowest common denominator (42m22s).
The hope is that the use of AI tools will bring a spirit of play and creativity to everyone in all disciplines, allowing them to do things they couldn't do before (42m53s).
However, there are challenges to this scenario, including human nature, which requires an investment of time and effort to be playful and creative (43m15s).
Creating meaningful and worthwhile things often requires effort, frustration, and confusion, which are part of the process of learning and improving, whether it's playing a difficult video game, learning something new, or getting better at an instrument (43m21s).
Having a supportive environment with time, will, desire, and motivation is necessary to overcome challenges, but life is getting harder for most people, making it increasingly a privilege to have time to do things just for personal fulfillment (43m48s).
The timeline for achieving personal goals is difficult to come by, and it's getting harder for people to have the time and resources to pursue their passions (44m5s).
New technologies, such as instruments that use AI to track hand movements, require a different way of thinking about human-computer interaction and may need alternative approaches to prompt-based engineering (44m24s).
There should be room for different ways of working with AI that consider the physical and emotional aspects of human interaction, rather than just focusing on extraction and profit (44m44s).
Unfortunately, there is evidence that AI is headed in a direction of extraction rather than inclusion, with the goal of maximizing profit by removing humans from loops and automating jobs (45m10s).
The focus on profit and automation can lead to a lack of consideration for the cultural and social implications of AI, and the potential consequences for human well-being and flourishing (45m30s).
Startup founders and business leaders have a social responsibility to consider the impact of their creations on people's lives, communities, and families, and to prioritize social accountability alongside profit and growth (46m17s).
The use of AI raises questions about social responsibility and accountability, particularly when it comes to the potential consequences of widespread adoption and the impact on people's lives and communities (46m51s).
The culture we're living in is changing, and it's essential to critically question what we do, considering aesthetic, social, and cultural dimensions, in addition to other aspects (47m3s).
Play and expression are crucial, as they can help humans feel more like themselves, and if technology can facilitate this, it's a victory (47m24s).
Using technology as a tool for humans to be more themselves, feel more included, safe, and free to be themselves, and feel understood is a hopeful goal (48m1s).
To achieve this, it's necessary to feel included, safe, and free to be oneself, and technology can be a means to facilitate this (48m10s).
There's still much work to be done, and many challenges lie ahead, but it's worth working towards this goal (48m25s).
The Ocarina app, laptop Orchestra, and other projects are examples of using technology to facilitate human expression and creativity (48m45s).
The future of music and other topics will be explored in more depth in a series at theverge.com (49m1s).
The Vergecast is produced by Liam James, Will Porough, and Eric Gomez, and is part of the VOX media podcast Network (49m27s).
Support for the Vergecast is brought to you by Nissan, specifically the Nissan Kicks, a city-size crossover redefined for urban adventures (49m59s).

Summarize anything forget nothing

Rated 4.9 on Product Hunt

Get Recall Free