Generally AI Episode 2: AI-Generated Speech and Music
12 Feb 2024 (9 months ago)
AI-Generated Voices
- Stephen Hawking used a voice synthesizer called the Cortex 510, which was based on the voice of Dennis Clut.
- Apple is introducing a new feature called "Personal Voice" in iOS, which allows users to create their own synthetic voice.
- Artificially generated voices can be used for various purposes, including assisting individuals with speech disabilities, impersonating others for malicious intent, and editing audio content.
- Meta's Voice Box model, an open-source tool, enables users to create synthetic voices, but access to the model is currently limited.
- AI voice generation tools require explicit consent from the voice owner to create an artificial model of their voice.
- Malicious use of AI-generated voices includes impersonating celebrities or individuals for financial gain or spreading misinformation.
- Celebrities are offering services to record personalized voice messages for a fee, raising ethical concerns about consent and authenticity.
- Protecting oneself from voice theft involves limiting publicly available recordings, being cautious of unusual requests (e.g., asking for gift cards), and verifying personal relationships through unique questions.
Ethical Considerations
- The ethical use of AI-generated voices should prioritize entertainment value and beneficial purposes, while considering potential malicious uses.
- Deepfake technology, including AI-generated voices, poses legal challenges regarding copyright, ownership, and impersonation.
Music Generation
- In the 1980s, hip-hop acts like Africa Bambaataa used synthesized sounds to replace real instruments, made possible by the development of MIDI (Musical Instrument Digital Interface).
- Generative AI models like OpenAI's MuseNet and Google's Music Transformer can generate sequences of MIDI notes, allowing for the creation of new music.
- Diffusion models, commonly used for image generation, have also been applied to music generation.
- Google's Noise2Music model takes audio noise and progressively denoises it, guided by a text prompt.
- Spectrograms, which represent sound as images, can be generated and modified using fine-tuned diffusion models.
- Recent techniques for music generation at the audio level include Meta's MusicGen and Google's MusicLM, which output audio tokens instead of text tokens.
- The metawin AI can generate 12-second audio clips with one bar per second, while Google's AI cannot generate audio.
- The metawin AI generated a blues riff that was better than the first two clips generated by other AIs.
- The Riffusion AI generated a continuous stream of music that was not well-received.
- Stable diffusion models do not have any grammar rules or music theory, they generate music from nothing.
- There is a potential market for AI-generated music, especially for street performers who can use it as a backing band.
Moog Synthesizer
- The speaker owns a record player and found a record with the sounds of the Moog synthesizer when it was new.
- Moog is a synthesizer company based in North Carolina.
- Moog holds an annual festival in Durham, North Carolina.
- The festival is expensive to attend.
- Attendees do not receive a free synthesizer for attending the festival.