AI, ML, and Data Engineering InfoQ Trends Report—September 2023

07 Oct 2024 (7 months ago)

QCon San Francisco and InfoQ Podcast

The InfoQ International Software Development Conference, QCon, will be held in San Francisco from October 2nd through 6th, featuring real-world technical talks from innovative senior software development practitioners on applying emerging patterns and practices to address current challenges (10s).
The podcast is part of the 2023 Trends report, discussing what's happening in the AI, ML, and data engineering space, with a panel of subject matter experts and practitioners (42s).
Shi Penchala, lead editor for the AI, ML, and data engineering community at InfoQ, is facilitating the conversation (51s).
The panelists include Sharin Thomas, a Staff engineer at Chime, who has been building data platforms, data infrastructure, and has a keen interest in streaming, data discoverability, governance, operations, and the role data plays in new advancements in AI (1m16s).
Roland Merton, a date and data scientist at Bumble, is working with computer vision and has a background in computer science (2m15s).
Daniel, an engineer with experience in software development, is an AWS Community Builder on machine learning and is currently developing artificial intelligence and machine learning products for different industries (2m30s).
Anthony Alford, director of development at Genesis, has worked on customer experience-related AI projects and has a background in robotics and intelligent robot software control (2m53s).
The podcast aims to discuss what's happening in the AI, ML, and data engineering space, what's not happening, and what's coming up that listeners should be aware of (3m26s).
The Trends report has two major components: the podcast and another part that will be discussed later (3m45s).
The InfoQ Trends Report is divided into two parts: a podcast discussion and a written article available on the InfoQ website, which includes a trends graph highlighting the different phases of technology adoption and providing more details on individual technologies (3m58s).
The trends graph is a valuable resource that offers insights into the adoption phases of various technologies and provides updates on new and updated technologies since the previous year's report (4m5s).

Generative AI Overview

Generative AI, also known as Gen AI, has been gaining significant attention since the announcement of GPT-3 a couple of years ago, and especially with the release of GPT-4 and ChatGPT earlier this year (4m48s).
ChatGPT was used by over 17 million people in the first two months of its release, making it one of the fastest-adopted technologies (4m58s).
Major players in the space, including OpenAI, Google, and Meta AI, have announced their plans for Gen AI, with OpenAI releasing ChatGPT, Google announcing Bard, and Meta AI releasing Llama 1 and Llama 2 (5m19s).
The discussion aims to highlight the value and benefits of Gen AI technologies, beyond the hype surrounding them (5m24s).
Generative AI is defined as AI based on autocompleting a certain prompt, allowing for zero-shot learning and the ability to turn data into actionable written language (6m32s).
The concept of generative AI is not new, with generative adversarial network (GAN) models for image generation having been around for almost 10 years, but language models have gained significant attention since 2019 with the release of GPT-2 (7m1s).
The field of generative AI has gained significant traction since 2020, with the release of GPT-3 and other image generators (7m31s).
Stable diffusion and generative AI for images and text are becoming increasingly popular, with people using them together to create stories and illustrations, potentially cutting humans out of the loop (7m38s).
There have been significant improvements in text generation, with GPT 3.5 and GPT 4 showing remarkable advancements, attributed to larger networks, more data, and higher-quality data (8m9s).
The usability experience of GPT has made it accessible to a wider audience, including non-technical individuals, with many people adopting it for various purposes (8m42s).
Mid Journey is considered the clear winner in image generation, despite its limited usability, with users paying $35 a month for a subscription and working with it through a Discord bot (9m2s).
Once the hurdle of usability is overcome, generative AI is expected to revolutionize the industry, making it accessible to the general public (9m34s).
ChatGPT has become synonymous with generative AI, with many companies and individuals using it for various applications, including internal projects and client solutions (10m15s).
A new cottage industry of companies is emerging, built on top of generative AI, offering services such as homework assignments and code generation (10m47s).
Large companies are creating their own vision and roadmaps for generative AI, indicating a growing interest and investment in the technology (11m20s).
Generative AI is expected to see increased adoption, with many companies applying it in various ways, and its integration into products is relatively easy (11m26s).
The initial value proposition of GPT-3 was that it didn't require fine-tuning, but now people are fine-tuning large language models, especially smaller and open-source ones like LLaMA, for specific domains (12m12s).
Fine-tuning these models is becoming easier, but it still requires significant hardware and a substantial dataset, although models are starting to shrink in size (12m23s).
Companies like Xedia, Morgan Stanley, Stripe, Microsoft, and Slack are already using generative AI, and adoption is increasing (13m11s).
Prompt engineering is becoming a discipline in its own right, and it's unclear how the responsibility for prompt engineering will be integrated into traditional software development processes (14m18s).
The concept of "Chain of Thought" prompting has been developed, where researchers found that telling language models to explain their thoughts step-by-step leads to better results (13m42s).
The quality of results from language models and image generation can vary greatly depending on the prompt used, highlighting the importance of prompt engineering (14m1s).
It's unlikely that dedicated prompt engineers will be needed in every project, as people will learn to use these tools effectively on their own (14m52s).

Smaller Language Models and Fine-tuning

The use of language models like LLaMA from Meta and B from Google, as well as Amazon's bet on generative AI, is expected to continue growing, with new products like Cloud and Amazon B Rock emerging in the market (15m32s).
Smaller models, such as those being developed by open-source communities, are becoming increasingly popular, with people fine-tuning and distilling models like LLaMA to create new models like VUNA and Alpaca (15m59s).
The proliferation of these smaller models is expected to continue, allowing companies to alleviate concerns about using closed APIs and sending data to unknown sources (16m53s).
Research into increasing sequence length, which is the amount of history that can be put into a chat, is ongoing, with some models supporting up to 32,000 tokens and potentially up to a million tokens in the future (17m1s).
The ability to support longer sequence lengths is expected to enable new applications, such as in-context learning, where a model can be given a large amount of text, such as a book or a company's knowledge base, and summarize or answer questions about it (17m40s).
The trend of using language models for summarization is expected to continue, with applications in industries such as law, where firms are using models to summarize legal documents, and science, where researchers are using models to summarize and extract information from large numbers of papers (18m20s).

Speech Synthesis and Recognition

There's a trend of summarizing papers and documents, and another trend is speech synthesis, which involves analyzing speech data and using solutions for it (18m45s).
Google and Meta are working on speech synthesis, releasing several models this year, including multilingual models, with Google focusing on speech-to-speech translation (19m0s).
Open AI released Whisper, an open-source speech recognition model, at the end of last year, which is quite good for speech sentences (19m11s).
Meta released Voice Box, a model that can take a speech audio and replace bits of it, similar to in-painting in images, but the company is cautious about releasing it due to potential abuse (19m54s).
The development of speech synthesis models raises ethical and responsible AI consequences, which will be discussed later (20m32s).
Google's universal speech model has a 1,000 language initiative, which will be huge for products involved with speech, such as Google products or Alexa (20m44s).
The implementation of these models on their own hardware and products will lead to a new way of products with improved speech recognition and synthesis capabilities (21m4s).

LLM Ops and the Operationalization of LLMs

With the adoption of large language models, teams will need to own and maintain the operations side, leading to the emergence of a new term, LLM Ops (21m47s).
MLOps brings rigor to the process of building and launching machine learning models, and it will be essential for LLM-based applications (21m57s).
Operational requirements for large language models (LLMs) are similar to those for traditional machine learning pipelines, but there are nuances that make operationalizing LLMs more challenging, such as collecting human feedback for reinforcement learning and prompt engineering (22m4s).
Performance metrics for LLMs are different and constantly evolving, making it difficult to predict how they will develop in the future (22m30s).
The LLM development life cycle consists of data ingestion, data prep, prompt engineering, and potentially complex tasks like chaining LLM calls and external calls to a knowledge base (22m46s).
The entire life cycle requires rigor, and LLM Ops might become its own distinct field, with ML Ops being a subset of it (23m5s).
As more applications use LLMs, the importance of LLM Ops will increase, requiring teams to continually work on data, prompt engineering, and server architecture to ensure the AI works correctly (23m15s).

Vector Databases

Vector database technology, including embedded stores, is gaining attention, with use cases such as using sentence embedding to create observability solutions for generative applications (24m18s).
Vector search databases are needed because large language models have limited history, and companies may want to store and query large amounts of data, such as summaries of documents, as feature vectors (24m39s).
This technology will become more important as companies use it to store and query large amounts of data, such as legal documents or Wikipedia articles (25m3s).
Vector databases are becoming increasingly important, allowing for efficient similarity searches in large datasets by storing feature vectors instead of raw data, and can be used with technologies such as Pinecone or Mes (25m36s).
There has been a significant increase in funding for vector database technologies, but a slower adoption rate among developers, which is expected to change in the coming year (26m6s).
Chroma is another player in the vector database space, offering an open-source solution (26m49s).
The choice of vector database technology depends on the specific use case and the characteristics of the data being searched (27m1s).
Feature stores have become a crucial part of machine learning solutions, and vector databases are likely to follow a similar trend (27m19s).
Similarity search applications are on the rise, with use cases such as searching for weather phenomena in satellite images (27m31s).

Robotics and Drones

Robotics and drone technologies are also advancing, with a trend of decreasing investments, but promising developments such as the establishment of an AI Institute by Boston Dynamics (28m37s).
Cheaper and more accessible remote control technologies are becoming available, which could further accelerate the development of robotics and drone technologies (28m58s).
Robots are becoming more viable and affordable, with the cheapest legged balancing robots available for around $1,500, making it possible for individuals to buy a robot as a platform and integrate their own hardware on top of it (29m8s).
The Robot Operating System (ROS) is still seen as the leading software, with more and more adoption to ROS 2, and companies like Fiam are building middleware to easily add and configure plugins (29m35s).
Google is publishing research on using language models to control robots, using them as a user interface to perform tasks, and integrating sensor data into this process (30m1s).
This development is interesting, especially considering how hard it was to achieve in the past, and it has the potential to revolutionize the way robots are controlled and interact with their environment (30m30s).
The use of large language models and computer vision models can enable robots to recognize objects and locations, and understand natural language commands, making it easier to program and interact with them (31m7s).
The manufacturing industry can benefit from the integration of robotics and virtual manufacturing platforms, allowing for the simulation and testing of processes that are too costly or unsafe to perform in the physical world (31m36s).
As robot technology advances, it is expected that robots will become more approachable as products, rather than just research projects, with companies like Tesla working on robots like the Optimus robot (32m13s).
The future of robotics is exciting, with the potential for robots to go places they have not gone before, and the possibility of seeing the first robot on the market in the near future (32m26s).

Ethical Considerations and AI Regulation

The increasing power of AI, ML, and data engineering technologies brings a responsibility to be ethical and fair, highlighting the importance of considering the ethical dimension of these technologies (32m28s).
AI ethics is crucial in addressing issues such as bias, discrimination, and the potential risks to personal data and privacy, as well as ensuring that AI systems make ethical decisions and provide accurate information (33m5s).
The adoption of AI technologies also raises concerns about unemployment and economic impact, including the potential displacement of jobs, and the need to consider sustainability and environmental implications (33m34s).
Governments around the world are developing their own solutions to regulate AI, but it is unclear whether these individual approaches will be effective, given the global impact of AI on humanity (33m53s).
A unified approach to regulating AI, potentially led by the United Nations, may be necessary to address the global implications of AI on humanity (34m27s).
There is a growing recognition of the importance of safety in generative AI, but this must be balanced against the need to avoid overly restrictive regulations that could stifle innovation (35m5s).
The trade-off between safety and functionality in AI systems is a topic of ongoing debate, with some users expressing frustration at the limitations imposed on AI models to ensure safety (35m10s).
The development of AI regulations is an ongoing process, with governments and experts continuing to explore the best approaches to balancing innovation with responsibility (34m41s).
There is a growing trend of people discussing the potential risks and downsides of AI and language models, such as ChatGPT, which can sometimes produce false or hurtful information, and companies are trying to implement safeguards but are unsure how to guarantee their effectiveness (35m51s).
Research has shown that it is possible to automatically generate attacks that can "jailbreak" language models, including ChatGPT, by creating prompts that can manipulate the model's output, highlighting the need for white hat hacking to identify and address these vulnerabilities (36m35s).
The importance of considering all possible edge cases and both the positive and negative consequences of using language models like ChatGPT is emphasized, as well as the need to remind users that the output is generated by a large language model and may not always be accurate or reliable (37m6s).
Discrimination, even for one demography or user, is not acceptable, and it is crucial to consider the potential risks and downsides of using language models (38m18s).

Explainable AI and Data Governance

Explainable AI is a way to explain how a model came to a result or conclusion, and it is becoming increasingly important as governments make regulations and discuss new laws related to AI ethics (38m27s).
Explainable AI can help address concerns about AI models becoming "dumber" and provide insights into why a model is making certain decisions, and it may play a significant role in the future of AI development (38m56s).
The increasing importance of AI explainability, data discovery, lineage, labeling, and operation is driven by the need for transparency and accountability in AI decision-making, as well as the emergence of regulations such as GDPR and CCPA (39m22s).
The use of AI models can lead to biased outcomes if trained on biased data, as seen in the example of Amazon's hiring model that disproportionately selected men due to being trained on a dataset of mostly men's resumes (39m44s).
In the new world of AI, good model development practices, data discovery, lineage, labeling, and operation will become crucial to ensure AI systems are fair, transparent, and reliable (40m0s).

Data Engineering Trends

The data engineering space is undergoing significant developments and innovations, with emerging trends and patterns that leaders should be aware of in their data management projects (40m22s).
There is a growing emphasis on speed and low latency in data engineering, with unified batch and streaming platforms becoming more popular, and architectures like Carart gaining adoption (40m49s).
Data mesh has become a buzzword as organizations become more complex, and it is no longer sufficient for a central data team to manage everyone's use cases (41m9s).
Data contracts are gaining importance as a way to measure data quality, ensure data observability, and guarantee that systems and data products behave as expected (41m26s).
Data observability is becoming a main pattern in the data side, and it is no longer sufficient to have observability only at the systems and infrastructure level, but also at different abstraction layers (41m42s).
Companies like Monte Carlo, Anomalo, and experts like Chad Sanderson are emerging in the data observability and data contracts space (42m53s).
The need for data observability is increasing with AI, and it's not just about systems, but also about the type of data and its distribution, with past AI failures like the Zillow debacle and Amazon recruitment model serving as examples (43m7s).
Data disciplines and AI are two sides of the same coin, requiring end-to-end discipline to manage data and machine learning models (43m47s).
Sharin predicts that data mesh adoption will increase in the coming years, as companies feel that data teams are becoming a bottleneck, and explainability will also see more adoption (44m32s).

Predictions and Future Outlook

Anthony predicts that artificial general intelligence (AGI) will not be achieved in the next year, and possibly not even in his lifetime (44m50s).
Daniel believes that AI is here to stay, and with new products and technologies emerging, such as ChatPT, AI will become more accessible to normal people, not just researchers (45m27s).
Daniel also mentions that Elon Musk will be working on AI in his companies, making AI more approachable for everyone (45m36s).
The field of autonomous agents is an area of excitement, where agents can independently come up with ideas, such as creating a product to sell, without needing to be fed prompts or connected to other APIs (46m15s).
Autonomous agents may be able to perform tasks such as marketing products, emailing factories, and making reservations at restaurants, making life easier for individuals, especially in situations like planning a romantic evening while traveling to a new city (46m29s).
These agents could potentially be used for tasks like buying Valentine's gifts, making them useful for people like Roland, who is in the dating app development business (47m8s).
It is predicted that Large Language Models (LLMs) will become more accessible to the community, with open-source frameworks like LangChain making solutions more available, and LLMs will not be limited to closed-source solutions (47m33s).
The development of plugins for chatbots like ChatGPT is expected to continue, with potential plugins including those that can remember things for users, similar to the "Remembrall" from Harry Potter, and plugins that can interact with users through voice commands (47m56s).
Potential plugins for ChatGPT include a restaurant plugin, a voice-based plugin that can answer questions and send emails, and a plugin that can make decisions for users (48m40s).
Other desired plugins include ones that can tell users what they don't know, and ones that can run users' lives for them, making decisions on their behalf (49m21s).
The future of AI and its impact on humanity will depend on how the technology develops and is utilized, with the potential for both positive and negative outcomes (49m57s).
AI is a new way to explore previously unexplored things, making technology more approachable to everyone (50m14s).
The use of AI at the end of the day will be determined by humanity, and its impact will be shaped by human decisions (50m27s).

Conclusion

InfoQ will continue to provide updates on trends and new developments in the field of AI, ML, and data engineering (50m31s).
The conversation will be continued in the future, with a follow-up discussion to assess the accuracy of predictions made and the progress that has been achieved (51m3s).
The participants, including Daniel, Anthony, Roland, and Sharon, concluded the discussion by thanking the listeners and looking forward to the next conversation (51m8s).