Stanford Seminar - Responsible AI (h)as a Learning and Design Problem

14 Dec 2024 (4 days ago)
Stanford Seminar - Responsible AI (h)as a Learning and Design Problem

Introduction: The Problem of Bias in Generative AI and Existing Resources (Lines 1-5)

  • Generative AI can lead to various harms, such as amplifying or introducing biases in society, as demonstrated by newspaper headlines and research from experts like Sophia Noble, Ziad Obermeyer, and Joy Buolamwini. (31s)
  • The issue of biases in algorithms is not new and dates back to the 1990s, with work from researchers like Bachcha Friedman and Helen Nissenbaum on bias and computing systems. (1m12s)
  • To address these issues, various responsible AI resources have been developed, including software toolkits for fairness analyses, transparent reporting resources like model cards and data sheets, and organizational processes for responsible AI. (1m22s)
  • Companies like Microsoft have developed responsible AI standards and processes, such as the Microsoft Responsible AI Standard, which outlines goals for fairness and other principles in responsible AI. (1m41s)
  • Public sector organizations, like the US National Institute for Standards and Technology (NIST), have also developed responsible AI processes, including the AI Risk Management Framework to govern, map, measure, and manage various risks in AI development. (1m58s)

The Need for Better Support and Guidance for AI Developers (Lines 6-7)

  • Research has shown gaps in AI developers' knowledge and skills when working on fairness in AI, including a study where 10 AI product teams struggled to complete a fairness evaluation process due to a lack of understanding of who to evaluate fairness for. (2m44s)
  • This lack of understanding highlights the need for better support and guidance for AI developers in addressing fairness and bias in their work. (3m46s)

Gaps in AI Developers' Knowledge and Training in Responsible AI (Lines 8-12)

  • AI developers may not be trained for new forms of work involved in responsible AI, and there is evidence to suggest this from various studies, including one published in 2022 (4m30s).
  • Many universities are developing tech ethics courses, such as the Embedded Ethics program, to address this issue, and researchers like Casey Fiesler at the University of Colorado Boulder are surveying these courses to understand who is teaching ethics and how it is integrated with computer science concepts (5m18s).
  • However, working AI developers may not have taken these courses during their training and may not have the time or inclination to take them after work, leading to a gap in their knowledge about responsible AI (5m51s).
  • AI practitioners are taking on extra work to educate their team members about responsible AI, with some even teaching their teammates about harms they saw in user studies, despite it not being part of their job description (6m13s).
  • Research has found that cross-functional collaboration between teams from different roles can be challenging due to a lack of shared language and understanding of concepts like fairness, leading to the need for education and clarification (6m52s).

A Study on AI Practitioners' Learning and Challenges in Responsible AI (Lines 13-15)

  • A study was conducted to investigate what and how AI practitioners are currently learning about responsible AI, as well as the goals and aspirations of AI practitioners and responsible AI educators, and the challenges that get in the way of these goals (7m32s).
  • Semi-structured interviews were conducted with 40 participants from 16 companies, including AI practitioners and responsible AI educators, to gather data for the study (8m3s).
  • The study aims to understand the current state of knowledge and education about responsible AI among AI practitioners and educators, and to identify areas for improvement (8m2s).

Target Audiences for Responsible AI Training and the Scope of Current Resources (Lines 16-20)

  • AI practitioners, including developers of machine learning or AI models and developers of applications that those models are embedded into, are the primary focus of learning resources or trainings for responsible AI, but there is a larger set of potential audiences, including third-party AI applications, marketing, PR, corporate leadership, and broader audiences such as community organizations, civil society, public sector policymakers, and universities (8m58s).
  • A study was conducted to understand how AI practitioners learn and teach others about responsible AI, including the settings, reasons, effectiveness, topics, and skills involved, as well as how educators develop trainings and resources (9m51s).
  • The study found that practitioners are learning about responsible AI concepts such as recalling company principles, defining dimensions like fairness, and procedural knowledge like computing fairness metrics and creating data cards or model cards (10m18s).
  • However, the study also found that fewer resources are designed to help develop skills to identify new potential harms or proactively design generative AI to avoid those harms for new use cases (11m12s).
  • Many learning objectives focus on understanding and recall, rather than applying knowledge to new skills or use cases, according to Bloom's taxonomy of cognition (11m34s).

Learning Modalities and Pathways for Responsible AI (Lines 21-26)

  • Developers are learning about responsible AI through various modalities, including text documentation, notebooks, toolkits, training videos, YouTube, and collaborative learning settings like discussion groups (11m58s).
  • The study identified different pathways for learning, including adapting knowledge from other domains, such as participants who learned about ethics in courses outside of computer science (12m21s).
  • Many developers may not have the skill to apply what they learned in education or other fields to their work in building AI, and this application is a skill that needs to be fostered and cultivated (12m49s).
  • Practitioners are often searching for responsible AI (RI) resources internally and externally, including on social media, and are creating their own curriculum through self-study and online resources (13m10s).
  • This self-directed learning approach can be challenging, as learners may struggle to find legitimate resources or verify the credibility of information found on social media (14m8s).
  • Some learners are also finding resources and learning about ethics and responsibility through social media, but are often unclear about the legitimacy of these resources or their creators (14m3s).

Collaborative Learning and Informal Knowledge Sharing (Lines 27-29)

  • Learners are also learning from their coworkers in structured and unstructured settings, such as code reviews and conversations (14m31s).
  • A follow-up study led by J. Marie Soless explored how informal learning sites, such as code reviews and reading groups, can facilitate collaborative learning and raise awareness about AI risks (14m37s).
  • The study found that learners are often exposed to AI risks and ethics through personal experiences, such as family members protesting AI-related issues on social media (15m12s).

Framings of Responsible AI and the Computational Approach (Lines 30-32)

  • The study identified two main framings for responsible AI: a computational framing, which is a technical approach, and another framing that is not specified in this text (15m37s).
  • The computational framing is a technical approach that focuses on fairness metrics and software toolkits, but may overlook the human impact of AI systems or model failures (15m43s).
  • This computational orientation can affect every part of an educator's choices, including learning objectives, goals, and assessment methods (16m18s).

Challenges in Assessing Qualitative Concepts and the Need for Training (Lines 33-35)

  • Participants discussed the challenges of assessing the impact of AI systems on communities, with some suggesting that methods from the humanities and social sciences could be used to evaluate less quantifiable concepts (17m1s).
  • Many participants were aware of the importance of assessing qualitative concepts, but felt they lacked the training to develop such assessments, having primarily received computer science training (17m29s).
  • Educators recognized the need for a "train the trainers" approach to address the lack of training in assessing qualitative concepts and understanding which communities might experience fairness harms (18m18s).

Shifting Perspectives on Responsible AI and the Need for a Comprehensive Approach (Lines 36-38)

  • Initially, participants viewed responsible AI as a problem to be solved by someone else, but later recognized the scope of the issue and the need for a more comprehensive approach (18m24s).
  • There was a recognition that teaching easily quantifiable concepts, such as fairness metrics, was not the same as understanding the underlying issues and communicating them effectively (18m44s).
  • Educators knew that broader questions, such as training data and model cards, were important, but struggled to make time to teach or upskill in these areas (19m6s).

Procedural Orientation vs. Outcome-Oriented Approach in Responsible AI Training (Lines 39-40)

  • A procedural orientation was observed in many trainings and learning resources, where participants felt they were teaching company policies or toolkits rather than focusing on outcomes (19m25s).
  • This procedural orientation was described as a "process objective" rather than an "outcome objective", with some ambivalence about the need for culture change (19m42s).

Proactive Fairness Evaluation and the Challenges of a Procedural Approach (Lines 41-43)

  • Proactive playbooks for fairness evaluation involve asking questions at different stages, including problem formulation and model training, to address potential issues (20m2s).
  • Participants were ambivalent about requiring processes for fairness evaluation, as it might lead to a sanitized version of ethics that avoids normative questions about what should or should not be designed (20m16s).
  • Some participants felt that following company policies or processes might not help raise larger questions about the underlying root issues, such as military applications or policing technologies (20m51s).

The Need for Social and Cultural Considerations in Responsible AI (Lines 44-47)

  • There is a desire to move beyond technical approaches and focus on social and cultural elements of responsible AI, including sociological and anthropological angles (21m10s).
  • Recent reports from the Data and Society Institute and the Center for Democracy and Technology suggest that AI governance needs more technical expertise and sociological approaches (21m32s).
  • Participants wanted to know how to engage with communities to identify potential harms, understand what harms mean for them, and co-design AI systems with them (21m55s).
  • Many developers knew that engaging with communities was a good thing to do, but they didn't feel like they had the skills to do it (22m16s).

Applying Responsible AI Principles to Specific Products and Cultural Contexts (Lines 48-51)

  • A recent paper with collaborators from Microsoft explored the impacts of language technology on speakers of different language varieties (22m19s).
  • Participants aspired to build capacity for responsible AI by applying what they learned to their AI applications, but felt they lacked the skills to apply high-level principles to specific products and cultural contexts (22m48s).
  • Practitioners wanted to apply AI principles, such as fairness and transparency, to their specific products, but felt they needed more guidance on how to do so in different cultural contexts (23m26s).
  • Participants expressed a desire for customized trainings on responsible AI and fairness, tailored to specific contexts such as South Asian or Latin American settings, but technology companies felt they lacked the time, resources, or knowledge to develop these trainings (23m59s).

The Need for Context-Specific Training and Practical Examples (Lines 52-53)

  • There was a need for more case studies, scenarios, and examples of responsible AI in practice, as current examples were often irrelevant or overly reused (24m26s).
  • The application of learning about responsible AI is particularly challenging for generative AI models, which can be used in multiple contexts, putting more responsibility on product teams to consider potential harms (24m45s).

Organizational Pressures and the Tension Between Scalability and Depth in Learning (Lines 54-57)

  • Organizational pressures and tensions impacted the aspirations of educators, including the need to scale learning resources and trainings, which often prioritized self-study courses over more effective collaborative learning methods (25m46s).
  • Educators wanted to develop longitudinal curricula with increasing depth, but learners felt pressure to quickly ship products, leading to a preference for brief, superficial learning experiences (26m11s).
  • The lack of structured pathways for continued learning in depth was a significant issue, with few resources available for learners to engage in more in-depth exploration of responsible AI (26m48s).
  • Educators aimed to foster mindsets rather than provide prescriptive guidance, but organizational pressures often incentivized the latter, with learners seeking clear, actionable advice to navigate AI review processes (26m56s).

Moving Beyond Checklists and Compliance to Foster Responsible AI Mindsets (Lines 58-59)

  • There is no one-size-fits-all approach to achieving Responsible AI (RI), and it cannot be reduced to a simple checklist of six things to ensure fairness (27m28s).
  • A compliance-oriented approach to RI may not be effective, and instead, educators want to foster mindsets that allow people to reflect on their own practices (27m47s).

The Importance of Learning Environments and Integrating Social and Technical Skills (Lines 60-62)

  • The learning environments for RI, including the sites for learning, impact what is learned and how, with large tech companies and open-source platforms having different approaches (28m0s).
  • Informal sites for learning, community-based learning, and integrating social and technical skills and concepts can help avoid a procedural approach to RI (28m16s).
  • Designing technical learning opportunities that integrate social and technical skills and concepts is crucial, and this can be achieved by shifting professional norms and the identity of what it means to be an AI developer (28m37s).

Shifting the Culture of AI Development Towards Responsibility (Lines 63-65)

  • The hierarchy of knowledge in learning about RI needs to be resisted, and the culture of AI development needs to be shifted towards responsible AI (28m52s).
  • Developing pedagogical provocations to destabilize hegemonic values in AI development is necessary, but this can be challenging, especially in corporate contexts (29m34s).
  • Supporting learning in situ or in context, and fostering mindsets that apply RI concepts in development, is essential for meeting learners where they are (29m54s).

Supporting Responsible AI During Prototyping with Large Language Models (Lines 66-67)

  • A study on supporting responsible AI during prototyping processes, led by Jay Wang, explored how to develop AI applications using large language models (LLMs) in a responsible manner (30m14s).
  • The study found that developing AI applications using LLMs is now easy, but it raises questions about who is an AI practitioner and how to ensure responsible AI practices (30m30s).

Late-Stage Fairness Evaluations and the Need for Proactive Design (Lines 68-72)

  • A survey of 300 machine learning practitioners found that nearly half encountered fairness issues in their products, with 99% of those issues discovered after model deployment (31m26s).
  • The same survey found that many machine learning practitioners believe they have to release their model and then address fairness issues if someone raises concerns (31m16s).
  • A study analyzed 30 responsible AI or AI ethics toolkits and found that most focus on downstream development phases, such as model training, testing, deployment, and monitoring, rather than design phases (32m8s).
  • The study aimed to support responsible AI by fostering a mindset during the design phase, moving away from development and deployment towards ideation and prototyping phases (32m45s).
  • Most learning resources focus on development phases, with few focusing on design phases or design skills, which motivated the focus on proactive design for generative AI (33m0s).

The Farsight Tool for Proactive Harm Ideation (Lines 73-78)

  • A formative study and codesign study were conducted with AI prototypers using Google's AI Studio to generate and critique design ideas for proactive ideation about potential harms (33m21s).
  • The study used the API for the AI incident database, which contains news articles about AI incidents, to extract embeddings and compute similarity between prompts and headlines (33m49s).
  • The study developed a tool called Farsight, which can be used to analyze the similarity between system prompts and AI incident headlines, helping designers proactively identify potential harms (34m46s).
  • A tool called Farsight has been developed to help identify potential harms of AI models by analyzing the cosine similarity between a given prompt and AI incidents from a database, and suggesting potential use cases and misuses based on embeddings (34m54s).
  • The tool also includes an interactive tree visualization that suggests use cases, stakeholder groups, and potential harms based on a harms taxonomy from Renee Shelby (35m40s).
  • The goal of Farsight is to foster ideation about potential harms early in the prototyping phase, before a product has been built or launched (35m24s).

Evaluating the Effectiveness of Farsight in Harm Ideation (Lines 79-82)

  • A study was conducted with 42 participants to understand how Farsight changes the approach to identifying harms, and to see if it can augment the harm ideation process (36m38s).
  • The study had three conditions: a full version of Farsight, a light version called Farsight Light, and a control condition using a PDF of a harms taxonomy (37m7s).
  • The study found that after using Farsight, users were able to envision more harms independently, and focused more on users and use cases than the control group (38m28s).
  • The tool helped users think more about potential harms, and the study suggests that Farsight can be an effective tool for augmenting the harm ideation process (38m45s).

Addressing Longer-Term Harms and Mitigations (Lines 83-84)

  • Longer-term or second-order harms, such as cascading failures, can occur when one event leads to another, resulting in unforeseen consequences (38m47s).
  • The tool Farsight does not provide mitigations for these harms, but users who utilized it considered more possible mitigations during the prototyping phase (39m8s).

The Need for Further Research on Farsight's Impact and In-Situ Interventions (Lines 85-86)

  • While participants found Farsight useful and usable, it is essential to conduct studies on its effectiveness in changing development practices or outcomes in ecologically valid settings (39m42s).
  • There is a need for in-situ interventions to motivate learners during development, especially for practitioners who may not seek out training or learning resources independently (40m1s).

The Role of Subjectivity and Community Engagement in Harm Assessment (Lines 87-90)

  • The trade-offs between automation and human agency are raised, as there is a risk that developers might offload harm identification to the tool, undermining its purpose of supporting reflexivity and independent ideation (40m27s).
  • The study highlights the role of subjectivity, including people's unique positionality, backgrounds, lived experiences, and domain expertise, in identifying harms and evaluating their likelihood and severity (40m52s).
  • The low interrater reliability among third-party raters who evaluated the likelihood and severity of harms suggests that subjectivity and lived experiences play a significant role in these assessments (41m33s).
  • Engagement with communities impacted by AI systems through participatory design or co-design is encouraged to foster mitigation and address the limitations of technical mitigations (42m11s).

Design Choices as Mitigation Strategies and the Importance of Early Intervention (Lines 91-93)

  • Design choices can be a mitigation strategy, but this aspect was not supported by the tool, and further exploration is needed to understand its potential (42m20s).
  • Different design choices at the beginning of a project could potentially avoid some of the harms associated with AI, and intervening in the prototyping phase is crucial (42m40s).
  • The focus should be on designing the right thing, rather than just designing the thing right, as emphasized by a Bill Buxton quote (42m50s).

Integrating Social and Technical Topics and Assessing Sociotechnical Skills (Lines 94-95)

  • Integrating social and technical topics in responsible AI is essential, rather than siloing them in learning objectives, assessment, or resources (43m9s).
  • Engineers working in AI need to move towards higher-order cognition, including application, analysis, evaluation, and creation, and assessing these sociotechnical skills is crucial (43m28s).

Balancing Scalability and Collaboration in Responsible AI Learning (Lines 96-97)

  • Balancing scalability and collaborative learning approaches is necessary, and determining what all developers need versus different roles is essential (43m47s).
  • Designing learning environments that incorporate responsible AI principles is vital, and moving this work upstream from pre-launch review to early-stage problem formulation or design is necessary (44m1s).

Adapting Participatory AI Methods for Pre-trained Foundation Models (Lines 98-100)

  • Participatory AI methods, such as codesign and value-sensitive design, need to be adapted for the current paradigm of pre-trained foundation models, which can impede responsible AI work (44m54s).
  • Research has shown that nearly all papers on participatory AI focus on the user interface, rather than broader design questions, highlighting the need for more methodological approaches (44m30s).
  • Emerging policy and regulatory work, such as the EU AI Act and US NIST, calls out human factors in risk management, emphasizing the importance of design methods in responsible AI work (46m4s).

The Learning and Design Problem of Responsible AI and Open Questions (Lines 101-102)

  • The field of responsible AI has a learning and design problem, and it is essential to conceptualize it as such to address the challenges it poses (46m19s).
  • There are open questions about how to support the work of responsible AI and how to evaluate its effectiveness in actual products (46m16s).

Evaluating the Impact of Policy and Responsible AI on Product Development (Lines 103-109)

  • A question was raised about the impact of policy and responsible AI (RI) on the actual product being built and whether there is an assessment or evaluation of each stage of the process and its effectiveness on the product (46m59s).
  • Another question was asked about whether users should be aware of biased models or problems with the current system and if they should be warned or educated about these issues (47m14s).
  • Evaluations are happening at each stage of the responsible AI process, including model-level and product-level evaluations, but more cross-pollination across these stages is needed (47m50s).
  • Laura Whinger and her collaborators at Google Deep Mind proposed a framework for sociotechnical evaluations that includes risks that any product application using a base model or pre-trained model might encounter (48m0s).
  • Evaluations should include risks, interaction evaluations, and broader societal or systemic evaluations to ensure that responsible AI is effective and improves the user experience (48m31s).
  • There is a need for more mechanisms to surface risks or harms encountered during user studies or product development and to develop more effective evaluations that improve the user experience (49m24s).
  • The effectiveness of evaluations in improving the user experience is a distinct question that requires further exploration (49m45s).

Supporting User Awareness and Education about Responsible AI (Lines 110-112)

  • In the US, the NIST framework is not binding legislation or regulation, but federal agencies such as the Federal Trade Commission and the Equal Employment Opportunity Commission are working to enforce violations within their specific purview (49m59s).
  • There is a need for more support for users in learning about responsible AI harms and guidance on how to use it responsibly, but users should not be solely responsible for using systems responsibly, especially if they were not developed responsibly (50m31s).
  • Communicating the limitations of language models, such as hallucinations, through broader education or AI literacy could inform users on the appropriate use cases for AI (51m0s).

Resources and Toolkits for Community Organizations and the Broader Public (Lines 113-114)

  • Resources and toolkits, such as the Algorithmic Equity Kit led by U Peaks Craft, Mike Catel, and Meg Young, are available for community organizations to advocate for responsible AI (51m33s).
  • There is a lack of AI ethics toolkits geared towards community organizations and the broader public, with most being targeted towards developers (52m0s).

The "Ship It Now" Mentality and the Challenges of Bias and Fairness (Lines 115-117)

  • The "ship it now" mentality in the industry can lead to models being deployed without proper consideration for bias and fairness, resulting in issues being discovered later on (52m38s).
  • The design phase for models is still an art and science, and figuring out the data distribution to ensure fairness is an unsolved problem (53m31s).
  • Bias, fairness, and responsibility are unsolved problems not only in generative AI but also in supervised learning and classical machine learning (53m55s).

Resource Availability and Organizational Incentives in Responsible AI (Lines 118-119)

  • There are more resources available to support responsible AI, but their effectiveness is uncertain, and organizational incentives often prioritize shipping products over responsible AI practices (54m7s).
  • Many teams focus on data collection and processing, which provides opportunities for intervention, but pre-trained models pose a challenge as they are often trained by different companies or teams on undisclosed data sets (54m20s).

Challenges with Pre-trained Models and Data Transparency (Lines 120-121)

  • Third-party companies using pre-trained models through APIs may not know what data was used to train the base model, making it difficult to evaluate, remediate, or retrain the model (55m12s).
  • In high-stakes settings, traditional machine learning models may be more widespread due to their auditability and traceability, but there is still room for improvement in responsible AI practices (55m41s).

The Importance of Cross-functional Collaboration and Shared Language (Lines 122-129)

  • Differences in how teams think about implementing responsible AI and where they begin from can impact the learning resources provided, highlighting the need for a shared language and understanding of terms like fairness (56m31s).
  • Cross-functional collaboration is essential for responsible AI, but teams from different disciplines may have different knowledge and perspectives on the issue, making it challenging to develop a shared approach (57m6s).
  • A lack of shared language and the use of "suitcase terms" like fairness can hinder collaboration and responsible AI practices, emphasizing the need for clearer definitions and meanings (57m43s).
  • Different professionals, such as data scientists and UX researchers or social scientists, may have varying interpretations of the term "fairness," with data scientists possibly referring to equalized odds or demographic parity, while UX researchers or social scientists may have a different understanding (57m55s).
  • Interdisciplinary collaboration can be challenging due to differences in terminology and understanding, highlighting the need for more learning resources to facilitate effective communication across disciplines (58m10s).
  • There is a need for both role-specific and general trainings to address the complexities of responsible AI, as well as to enable professionals to speak across disciplinary boundaries and understand the basics of model training and design (58m19s).
  • Every role involved in AI development should have a basic understanding of AI concepts, such as model training and fairness, to facilitate effective communication and collaboration across disciplines (58m41s).
  • Having a shared understanding of AI concepts can help identify potential interventions or design opportunities, ultimately contributing to the development of more responsible AI systems (58m58s).

Overwhelmed by Endless Content?