Susan Shu Chang on Bridging Foundational Machine Learning and Generative AI

06 Jan 2025 (6 months ago)

Introduction to Machine Learning for Software Engineers

The AI and Machine Learning for Software Engineers track at CuCon San Francisco covered various topics in foundational machine learning, including recommender systems, reinforcement learning, and machine learning in production (51s).
The track aimed to provide engineers with concrete examples and insights into how machine learning works behind the scenes, as many engineers are working with companies that use machine learning or have been adjacent to it (2m36s).
The track included talks from companies such as Netflix, which discussed how to get better recommendations and tailor them to specific people, and Meta, which presented an idea on using long-term interaction as a reinforcement learning problem (2m53s).
The importance of machine learning in production was also highlighted, as having a fancy machine learning model is not enough if it cannot be run in production and serve users (1m52s).

Getting Started with Machine Learning

Knowledge graphs and large language models were also discussed, and how they can be merged to build a better experience (2m11s).
For those looking to get into machine learning, PyTorch and TensorFlow are still recommended as valid tools to play around with and get started, with many tutorials available online to build an initial end-to-end app (3m31s).
Building a simple, lightweight web app is essential to connect machine learning models to the UI, which is familiar to software engineers, and is a common practice in the industry (3m51s).

Challenges and Considerations in Machine Learning

Deploying machine learning models into production can be challenging, and common pitfalls include finding the right problem, communicating with management, and finding the right solutions (4m23s).
As a principal data scientist, it's crucial to determine what is overkill and what is practical, and sometimes starting with rules-based approaches is more effective than machine learning models (4m38s).
Rules-based approaches provide a baseline to compare machine learning models, and if the model performs worse, it may indicate that machine learning is not the best solution or needs significant improvement (4m50s).
Focusing on business logic and rules in the early stages is essential to avoid wasting resources and ensure a good return on investment (5m17s).

Working with AI and Machine Learning Tools and APIs

With the availability of tools and APIs like GT4, it's now more open for everyone to work with AI and machine learning, even without prior knowledge of how the models work (5m43s).
However, having prior knowledge of AI and machine learning can still be beneficial, especially when it comes to fine-tuning and evaluating models, or when working with complex scenarios that require opening the "black box" (5m53s).
Engineers can build a lot of things without knowing the exact internals of machine learning models, and tools like Open AI and Anthropics can be integrated into products using API keys (6m24s).
The example of building an AI assistant using Open AI's API key demonstrates how engineers can build an MVP completely by themselves, and then collaborate with experts for improvements and fine-tuning (6m59s).
Building a minimum viable product (MVP) is easy, but debugging and tweaking it can be difficult unless one understands how the tools and techniques work, which is where knowing the fundamentals is important (8m7s).
Not understanding what's happening under the hood can set people up for failure, especially when working with complex concepts like embeddings and generative AI (7m49s).

Elastic Common Schema and AI Assistant Development

The Elastic Common Schema allows for the creation of supervised models that can be trained on data without having access to the customer's data, as the schema ensures that the data will be in a consistent format (9m22s).
The Elastic team ships these models downstream to customers, who can then use them in their environment without any involvement from the team (9m39s).
The team also works on the Aumen AI assistant piece, which involves building out AI assistants and conducting tasks like evaluation, research, and R&D to add new functionality (9m56s).
To prove that a new feature or functionality works, the team conducts R&D projects, such as using large language models for session summarization, and evaluates the results (10m25s).
One example of a successful R&D project involved using large language models to summarize logs ingested into Elastic Search, which was able to understand the user's actions and commands from the logs (11m6s).
The team used real logs, pre-processed the data, and fed it to a large language model to see if it could summarize the user's session, which was a successful proof of concept (11m31s).
To improve the performance of language models, it is essential to have a large context window, and the team had to narrow down the content being fed into the models by extracting relevant fields and consolidating repeated data (11m45s).
The team used GPT-40 and Gemini 1, which had larger context windows, but found that it didn't necessarily mean better performance, and they had to go back and extract relevant fields (11m57s).
To evaluate the performance of the language model, the team used subject matter expertise, having a security researcher summarize the relevant information in natural language, and compared it to the response given by the large language model (12m56s).
The project is still in the R&D phase and has not been merged into the capability, but the team may revisit it in the future, and the lessons learned from this project have been applied to subsequent projects (13m37s).

Building MVPs and Prototypes with Streamlit

The team uses a template for evaluation, which has been found to be repeatable, and the AI tools and methods are constantly evolving, making it challenging to know what the next big thing is or how to adapt the database to make search work faster or better (14m17s).
For people starting with machine learning, it is recommended to start with tools like Streamlit, which provides a front-end for Python and allows users to create interactive chatbots with a few lines of code (14m57s).
Streamlit provides a way to create a website with a drop-down for basic responses, built-in templates, and an input box, making it easy to create MVPs (Minimum Viable Products) (15m10s).
Industry professionals in machine learning use tools like Streamlit to quickly spin up prototypes and demonstrate ideas to others, such as product people, as it is more tangible and convincing than showing JSON output or requiring others to run a Python script with specific dependencies (15m42s).
Streamlit is a useful tool for learning and starting small projects, but not for building large-scale end-to-end projects (16m24s).

Other Recommended Tools and Building Simple Apps

Other recommended tools for learning include Django, Fast API, and Sanic, which allow users to build simple apps and understand machine learning inputs and outputs (17m2s).
Building a simple app with these tools helps learners understand how machine learning fits into the entire infrastructure and can be used to create a Python function as a service or API (17m25s).

Machine Learning Interviews Book and Time Management

A book titled "Machine Learning Interviews" was written to help new graduates or career transitioners prepare for machine learning interviews, and it came about after the author taught a well-received online course on the topic for O'Reilly (17m45s).
The book proposal was initially declined, but the author eventually created a proposal that was selected by the publisher, and the book took a year to write in the author's spare time (18m40s).
Effective time management is crucial, with a recommended 3-5 hours of work per week, and a preference for a slow and steady approach to avoid stress and crunch time (19m23s).

Skills and Preparation for Machine Learning Jobs

The skills required for a machine learning job are highly dependent on the specific job and company, and it's essential to prioritize the skills mentioned in the job description (20m1s).
A general machine learning life cycle and workflow can be broken down into stages such as raw data, data pipelines, model development, and model deployment, and jobs will correspond to one or more of these responsibilities (20m11s).
Job seekers should not try to prepare for every possible topic, but instead focus on the specific areas mentioned in the job description and prioritize those questions (21m13s).
The book covers topics such as how to select what topics to prioritize, main machine learning algorithms, general Ops skills and tooling, and the responsibilities of various roles (21m27s).

Evolving Field and Staying Up-to-Date

Experienced machine learning professionals may still find the book helpful, even if they are already working in the field, as they may need to transition between different roles or areas of focus (21m57s).
The machine learning field is rapidly evolving, making it challenging to learn the necessary skills, and it's essential to stay up-to-date with the latest developments (22m42s).

Common Mistakes in Machine Learning Interviews

One common mistake made by job seekers is focusing too much on specific tooling, which may not add significant value to the interview process (23m11s).
When interviewing candidates for machine learning positions, it's essential to focus on generic skills rather than specific tooling, as this can limit the pool of potential candidates unnecessarily (24m22s).

Setting Up Effective Machine Learning Interviews

Companies setting up machine learning interviews should first identify the skills they want to hire for, such as data engineering, model training, or machine learning operations (MLOps) (24m55s).
The interview structure should be tailored to the specific role, asking questions relevant to the primary responsibilities of the position, whether it's data engineering, model training, or MLOps (25m21s).
A common mistake is not defining the role and its place in the company, leading to a disorganized interview process and a search for a "unicorn" candidate without a clear understanding of where they fit in the company (25m51s).
When hiring for more senior positions, it's essential to assess not only technical skills but also the ability to mentor, lead projects, and have insight into other responsibilities, such as MLOps (26m33s).
Senior candidates should demonstrate a broader understanding of the machine learning pipeline, including MLOps, and interview questions should aim to capture this breadth of skills (27m6s).
The interview process should be designed to evaluate the candidate's ability to contribute to the specific role and the company, rather than just focusing on narrow technical skills (26m16s).

T-Shaped Skills and Specialization

Having a T-shaped skill set, with a broad understanding of various aspects of machine learning and a deep understanding of a specific area, is beneficial for scaling up in a career, as it improves communication skills and the ability to work end-to-end (27m23s).
Specialization is becoming more common in larger companies, allowing individuals to focus on specific areas, such as model training, but having a broader understanding of the entire process can still be beneficial (27m57s).
Knowing a bit more about the deployment process and what comes before and after can differentiate a candidate from others with the same skills (28m16s).

Company Needs and Candidate Fit

Companies with a clear vision of what they want to hire may reject candidates who interview well but don't fit their specific needs, highlighting the importance of understanding the company's requirements (28m40s).
The ideal candidate for a role can change over time, depending on the company's current needs, making it essential for candidates to align their skills with the company's requirements (29m10s).
Luck can play a role in the interview process, as the fit between the candidate's skills and the company's needs may not be immediately clear, especially in machine learning jobs (29m57s).
The lack of specialization in machine learning jobs can make it challenging to determine the best fit for a role, but having a broad understanding of various aspects of machine learning can be beneficial (30m6s).