Low Rank Strikes Back in Modern Foundation Models

A comprehensive tutorial exploring the growing importance of low-rank approximation techniques in the context of large language models (LLMs). The session covers theoretical foundations, empirical observations, and practical applications of low-rank structures to enhance efficiency, interpretability, and robustness in LLMs.

Tutorial Sections

Explore the actual structure of the tutorial based on the slide content, covering low-rank techniques from theoretical foundations to cutting-edge applications.

Introduction & Foundations

Pursuing low-dimensional structures in high-dimensional data: a universal quest over centuries, from sparsity to low rank

Key Topics:

Universal Quest Sparsity to Low Rank SVD & Rank Theory Curse & Blessing of Dimensionality Low-Rank Simplicity Bias

Low Rank Gradients

LoRA, GaLore, and More

Low rank gradients (or weight updates) in foundation models, focusing on memory-efficient training techniques

Key Topics:

Memory Requirements System vs Algorithm Level GaLore Projection Q-GaLore APOLLO Optimizer

Low Rank Weight

Novel model compression & PEFT

How low-rank gradients lead to low-rank weights and enable novel model compression and parameter-efficient fine-tuning

Key Topics:

Gradient-Weight Connection Model Compression PEFT Applications Weight Structure Analysis

Low Rank Attention

Efficiency and safety applications

Low rank attention and neuron subspace with applications to efficiency and safety in large language models

Key Topics:

Attention Bottleneck Approximate Low-Rank Efficiency Applications Safety Implications

Going Beyond Low Rank

Vector space to measure space

Moving beyond traditional low-rank approaches: from vector space to measure space and understanding reasoning

Key Topics:

Beyond Low-Rank Vector to Measure Space Reasoning Understanding Advanced Applications

Key Techniques Covered

Based on the key references from the tutorial slides, covering foundational papers and cutting-edge research with direct links to the original publications.

📚

Overview

High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications

Wright & Ma Cambridge University Press 2022

📖 Read Paper

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

Balzano et al. submitted to IEEE Signal Processing Magazine 2025

📄 Read Paper

The Low-Rank Simplicity Bias in Deep Networks

Huh et al. TMLR 2023

📄 Read Paper

🔄

Part I: Gradients

LoRA: Low-Rank Adaptation of Large Language Models

Hu et al. ICLR 2022

📄 Read Paper

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao et al. ICML 2024

📄 Read Paper

APOLLO: SGD-like Memory AdamW-level Performance

Zhu et al. MLSys 2025

📄 Read Paper

⚖️

Part II: Weights

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Jaiswal et al. ICML 2025

📄 Read Paper

🎯

Part III: Attention

Linformer: Self-Attention with Linear Complexity

Wang et al. arXiv 2020

📄 Read Paper

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Wei et al. ICML 2024

📄 Read Paper

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Perin et al. COLM 2025

📄 Read Paper

Low Rank Strikes Back

Tutorial Sections

Introduction & Foundations

Low Rank Gradients

Low Rank Weight

Low Rank Attention

Going Beyond Low Rank

Key Techniques Covered

High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

The Low-Rank Simplicity Bias in Deep Networks

LoRA: Low-Rank Adaptation of Large Language Models

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

APOLLO: SGD-like Memory AdamW-level Performance

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Linformer: Self-Attention with Linear Complexity

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning

Acknowledgements