Low Rank Strikes Back

in Modern Foundation Models

Prof. Zhangyang "Atlas" Wang
Temple Foundation Endowed Associate Professor #7
VITA Group, University of Texas at Austin
Research Director, XTX Markets
Date
July 2025
Location
Porto, Portugal
Duration
5 Hours

A comprehensive tutorial exploring the growing importance of low-rank approximation techniques in the context of large language models (LLMs). The session covers theoretical foundations, empirical observations, and practical applications of low-rank structures to enhance efficiency, interpretability, and robustness in LLMs.

📚 Explore Tutorial ⬇️ Download Slides

Tutorial Sections

Explore the actual structure of the tutorial based on the slide content, covering low-rank techniques from theoretical foundations to cutting-edge applications.

1

Introduction & Foundations

Pursuing low-dimensional structures in high-dimensional data: a universal quest over centuries, from sparsity to low rank

Key Topics:
Universal Quest Sparsity to Low Rank SVD & Rank Theory Curse & Blessing of Dimensionality Low-Rank Simplicity Bias
2

Low Rank Gradients

LoRA, GaLore, and More

Low rank gradients (or weight updates) in foundation models, focusing on memory-efficient training techniques

Key Topics:
Memory Requirements System vs Algorithm Level GaLore Projection Q-GaLore APOLLO Optimizer
3

Low Rank Weight

Novel model compression & PEFT

How low-rank gradients lead to low-rank weights and enable novel model compression and parameter-efficient fine-tuning

Key Topics:
Gradient-Weight Connection Model Compression PEFT Applications Weight Structure Analysis
4

Low Rank Attention

Efficiency and safety applications

Low rank attention and neuron subspace with applications to efficiency and safety in large language models

Key Topics:
Attention Bottleneck Approximate Low-Rank Efficiency Applications Safety Implications
5

Going Beyond Low Rank

Vector space to measure space

Moving beyond traditional low-rank approaches: from vector space to measure space and understanding reasoning

Key Topics:
Beyond Low-Rank Vector to Measure Space Reasoning Understanding Advanced Applications

Key Techniques Covered

Based on the key references from the tutorial slides, covering foundational papers and cutting-edge research with direct links to the original publications.

📚
Overview

High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications

Wright & Ma Cambridge University Press 2022
📖 Read Paper

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

Balzano et al. submitted to IEEE Signal Processing Magazine 2025
📄 Read Paper

The Low-Rank Simplicity Bias in Deep Networks

Huh et al. TMLR 2023
📄 Read Paper
🔄
Part I: Gradients

LoRA: Low-Rank Adaptation of Large Language Models

Hu et al. ICLR 2022
📄 Read Paper

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao et al. ICML 2024
📄 Read Paper

APOLLO: SGD-like Memory AdamW-level Performance

Zhu et al. MLSys 2025
📄 Read Paper
⚖️
Part II: Weights

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Jaiswal et al. ICML 2025
📄 Read Paper
🎯
Part III: Attention

Linformer: Self-Attention with Linear Complexity

Wang et al. arXiv 2020
📄 Read Paper

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Wei et al. ICML 2024
📄 Read Paper

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Perin et al. COLM 2025
📄 Read Paper
⚖️
Part IV: Beyond Low Rank

Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning

Wang & Wang NeuS 2025
📄 Read Paper

Acknowledgements

Special thanks to the following researchers for sharing their slides and materials that are reused in this tutorial:

Ajay Jaiswal
UT Austin
Gabriel J. Perin
University of São Paulo
Peihao Wang
UT Austin
Yifan Wang
Purdue
Boyi Wei
Princeton
John Wright
Columbia
Jiawei Zhao
Meta
Zhenyu Zhang
UT Austin
150+
Total Slides
5
Main Sections
5
Hours Content
10+
Key References