Stanford CS149 I Parallel Computing I 2023 I Lecture 10 - Efficiently Evaluating DNNs on GPUs

21 Sep 2024 (5 days ago)
Stanford CS149 I Parallel Computing I 2023 I Lecture 10 - Efficiently Evaluating DNNs on GPUs

Rendering Overlapping Circles

Parallel Algorithm Design

Deep Neural Networks: Structure and Operations

Convolution in Deep Neural Networks

Deep Neural Network Architectures

Deep Neural Network Efficiency and Optimization

Matrix Multiplication in Deep Neural Networks

Matrix Multiplication Optimization Techniques

Memory Management and Optimization

Implicit Matrix Multiplication

Deep Learning Libraries and Optimization

Implicit GEMM and Operation Fusion

Attention Operation Optimization

Automated Optimization Frameworks

  • Optimizations like fusing batch normalization or resizing and padding into matrix multiplication were initially manual but are now being automated by frameworks like Jax, which analyze tensor loop nests to generate optimized code. rel="noopener noreferrer" target="_blank">(01:12:41)

GPUs and Deep Neural Network Computation

Overwhelmed by Endless Content?