Welcome to Dr. Harisankar Sadasivan’s Homepage

Tech lead, AI Performance Engineering, NVIDIA | Aff. Asst. Professor, UW Seattle | Distinguished Visitor, IEEE-CS

View My Work Get In Touch

Welcome to Harisankar Sadasivan’s Homepage

Tech lead, AI Performance Engineering, NVIDIA | Aff. Asst. Professor, UW Seattle | Distinguished Visitor, IEEE-CS

View My Work Get In Touch

Hi, I'm Hari Sadasivan

Tech lead, AI Performance Engineering, NVIDIA | Aff. Asst. Professor, UW Seattle | Distinguished Visitor, IEEE-CS

Seattle, WA, USA

hsadasiv@uw.edu

At NVIDIA, I serve as a Tech Lead for AI GPU performance engineering. Key responsibilities include:

Lead CUDA, Triton, CUTLASS, CuTe, CuTile, and cuDNN kernel optimization for Hopper, Blackwell, and Rubin-class GPUs.
Improve AI and LLM training/inference by fixing bottlenecks, fusing kernels, and optimizing memory bandwidth.
Enable PyTorch and JAX integration through XLA, MLIR, TorchInductor, and torch.compile.
Use Nsight Compute and Nsight Systems for profiling and performance analysis.
Work on fused GEMMs, attention kernels, distributed training, and inference optimization.
Drive co-design feedback for future GPU software and hardware.

At AMD, I worked as a GPU performance engineer and technical lead for AI workloads.

Optimized training and inference on MI300 and MI200 platforms.
Improved throughput, memory efficiency, and scaling for transformer workloads.
Collaborated with framework, architecture, and distributed systems teams.
Contributed to Composable Kernel and ROCm-based AI libraries.
Optimized attention, GEMM, batching, KV-cache, and multi-GPU communication.
Helped establish and lead the AMD Center of Excellence in AI at UW Seattle.

"Nothing in life is to be feared, it is only to be understood"- Marie Curie

Teaching

Bridging industry and academia to connect AI hardware, software, and student learning.

Teaching

I joined the University of Washington faculty in December 2023 and currently serve as an Affiliate Assistant Professor in ECE. As AI demand for FLOPs has grown far faster than Moore’s Law, I have focused on the software and hardware innovations needed to close that gap. To help students and researchers understand how AI workloads map to GPU hardware, I designed and teach an AMD GPU Programming course at the University of Washington Seattle. The course’s projects have also informed several HPC and AI efforts at AMD from 2023 to 2025. To deepen this work, I designed a new course, Matrix to Machines: GPU HW Design on FPGA for AI, where students build a GPU hardware and software stack from scratch on an FPGA.

Advising

I continue to advise students on GPU optimizations. Past advisees:

-Melissa Queen, University of Washington Seattle

-M. Emin Ozturk, University of Utah

-Juechu Dong and Xueshen Liu , U Michigan Ann Arbor

News

Latest updates.

[December 2025] — Symposium Chair
I'm chairing an IEEE joint-symposium on Systems for AI and robotics in Seattle.

[December 2024] — Invited to speak at the US Dept. of Defense
I was invited to address the challenges in HW-SW for the future of AI and genomics by the Defense Intelligence Agency, Department of Defense, USA.

[Nov 2024] — Minimap2 Chaining – ACM BCB 2024
Our GPU-accelerated Minimap2 chaining paper was presented at ACM BCB 2024. Here’s an AMD blog post on the collaboration.

[August 2024] — Speeding up AI matmuls on GPUs
Our work on improving work-partitioning for AI on GPUs is out. Here’s a pre-print.

[July 2024] — Defining the future of genomics HW and SW
Our paper on setting the direction of genomics acceleration for the coming decades is out. Here’s a pre-print.

Research

I envision a world where AI is advanced and performant enough to diagnose and find cures for all human diseases via techniques such as Precision Medicine & Drug Discovery. I realize that’s a big jump. So, I have broken down my interests for now.

High Performance Artificial Intelligence (AI):

Tall & Skinny GEMMs, Stream-K, Performance issues on multi-chiplet GPUs, LLM inference optimizations, Faster attention kernels, Parallelism ,Disaggregation

Research

High Performance -omics & Drug Discovery:

Long-read DNA/ MSA/ raw-signal alignment, AI-based basecalling, Genomics/metagenomics, protein/RNA structure prediction for drug discovery

Recent Publications

Research contributions across AI, genomics, GPU acceleration, computational biology, and high-performance computing.

All
Pre-print
Journal
Conference
Workshop

Stream-K++ adaptive GPU GEMM kernel scheduling paper figure

Conference 2025

Press Release

Latest updates.

Dr. Harisankar Sadasivan

Welcome to Dr. Harisankar Sadasivan’s Homepage

Welcome to Harisankar Sadasivan’s Homepage

Hi, I'm Hari Sadasivan

Teaching

Teaching

Advising

News

Research

Research

High Performance Artificial Intelligence (AI):

High Performance -omics & Drug Discovery:

Recent Publications

Stream-K++: Adaptive GPU GEMM Kernel Selection and Scheduling for AI Using Bloom Filters

brain lymphoma diagnostics through nanopore sequencing of cytology-negative CSF

Genomic Computing Revolution: Defining the Next Decades of Accelerating Genomics

GPU Accelerated Minimap2 for Long Read DNA Mapping

Dynamic Time Warping on GPU for Selective Nanopore Sequencing

Minimap2 for Accurate Long Read Alignment on GPUs

An Accelerator for Portable Virus Detection

Rapid Real-time Squiggle Classification for Read until using RawMap

Press Release

Welcome to Dr. Harisankar Sadasivan’s Homepage

Welcome to Harisankar Sadasivan’s Homepage

Hi, I'm Hari Sadasivan

Teaching

Advising

News

Research

Research

High Performance Artificial Intelligence (AI):

High Performance -omics & Drug Discovery:

Recent Publications

Press Release

Social Feed