Steepest Descent Density Control for Compact 3D Gaussian Splatting

Introduction

3D Gaussian Splatting (3DGS) has emerged as a powerful method for reconstructing 3D scenes and rendering them from arbitrary viewpoints. At the heart of 3DGS lies a customized optimization algorithm. Beyond gradient-based updates to the Gaussian parameters, density control plays a critical role in growing a sparse point cloud into a dense Gaussian mixture that accurately represents the scene. While the current heuristic-based approach performs reasonably well, fundamental questions why, when, and how should Gaussians be split remain unanswered:

Why is densification necessary?
When should densification happen?
How many offspring Gaussian is needed?
How to adjust Gaussian offspring's opacity, position, and size?

In this work, we provide clear and affirmative answers to these questions.

Method overview

TL;DR We theoretically investigate density control in 3DGS. As training via gradient descent progresses, many Gaussian primitives are observed to become stationary while failing to reconstruct the regions they cover. From an optimization-theoretic perspective, we reveal that these primitives are trapped in saddle points, the regions in the loss landscape where gradients are insufficient to further reduce loss, leaving parameters sub-optimal locally. To address this, our SteepGS efficiently identifies Gaussian points located in saddle area, splits them into two off-springs, and displaces new primitives along the steepest descent directions. This restores the effectiveness of successive gradient-based updates by escaping the saddle area. The figure on the right demonstrates this process.

To fully understand the underlying mechanism, we begin with a simple mathematical setup. Suppose the scene is represented by a single Gaussian function, $\theta = (p, \Sigma, o)$ — omitting color for simplicity — defined as $\sigma(x; \theta) = o \exp\left(-\frac{1}{2}(x - p)^\top \Sigma (x - p)\right)$. We consider splitting this Gaussian into $m$ offspring, where each offspring is assigned new parameters $\vartheta = \{ \vartheta_j \}$ and a scaled opacity $w = \{ w_j \}$, for every $j = 1, \dots, m$. Our questions convert to finding the values of $m$, $w$ and $\delta$.

Let $\mathcal{L}(\theta)$ and $\mathcal{L}(\vartheta, w)$ denote the photometric loss before and after splitting, respectively. First of all, we prove the following approximation to characterize the impact of densification on loss: $$ \mathcal{L}(\vartheta, w) \approx \textcolor{blue}{\underbrace{\mathcal{L}(\theta) + \nabla \mathcal{L}(\theta)^\top \mu + \frac{1}{2} \mu^\top \nabla^2 \mathcal{L}(\theta) \mu}_{\Gamma}} + \textcolor{red}{\underbrace{\frac{1}{2} \sum_{j=1}^{m} w_j \delta_j^\top S(\theta) \delta_j}_{\Delta}}, $$ where $\mu = \sum_j w_j \vartheta_j - \theta$ is the mean displacement of the offspring, $\delta_j = \vartheta_j - \theta - \mu$ is the deviation of each offspring from the mean, and $S(\theta)$ is named splitting matrix, being central to quantify the effect of splitting a Gaussian: $$ S(\theta) = \mathbb{E}_{x} \left[\frac{\partial \mathcal{L}}{\partial \sigma(x; \theta)} \nabla^2_{\theta} \sigma(x; \theta) \right]. $$ By examining the loss after splitting $\mathcal{L}(\vartheta, w)$, we can draw the following conclusions:

Why to split Gaussian? The term $\textcolor{blue}{\Gamma}$ can be optimized via gradient descent. However, when a Gaussian becomes trapped in a saddle point, gradients alone may provide no effective update. In contrast, splitting introduces the additional term $\textcolor{red}{\Delta}$, which can decrease the loss and steer the parameters away from the saddle.
When to split Gaussian? For splitting to effectively reduce the loss, the term $\textcolor{red}{\Delta}$ must be negative. Take a closer look, $\textcolor{red}{\Delta}$ is a quadratic functon w.r.t. splitting matrix $S(\theta)$. A necessary condition for $\textcolor{red}{\Delta}$ being negative is that $S(\theta)$ must NOT be positive semi-definite; otherwise, splitting can backfire!
How to split Gaussian? More is true. The density control strategy with steepest descent on loss can be given analytically. Based on the property of quatratic functions, setting $m = 2$ with $\delta_1 = v_{\min}(S(\theta))$ and $\delta_2 = -v_{\min}(S(\theta))$ is sufficient to achieve a minimizer of $\textcolor{red}{\Delta}$, where $v_{\min}(S(\theta))$ is the least eigenvector of splitting matrix $S(\theta)$.

All these results generalize well to the scenario with multiple Gaussians. Please see our paper for more details.

Splitting map visualization

Visualization of splitting points: we identify the points selected for splitting by our method and the original 3DGS ADC after 1k training steps, and highlight them in white to produce splitting maps overlaid on RGB renderings. We observed that SteepGS can effectively concentrate Gaussian splitting on under-fitting areas, achieving much fewer splitting points in the densification stage. This validates th effectiveness of our necessary condition for splitting.

Comparisons

We present comparisons between our SteepGS and other densification baselines on two Deep Blending scenes. For each case, the first row shows the rendered view along with the final number of Gaussians used to represent the scene, while the second row visualizes the per-pixel error relative to the ground truth.

BibTeX


@inproceedings{wang2025steepgs,
  title={Steepest Descent Density Control for Compact 3D Gaussian Splatting},
  author={Wang, Peihao and Wang, Yuehao and Wang, Dilin and Mohan, Sreyas and Fan, Zhiwen and Wu, Lemeng and Cai, Ruisi and Yeh, Yu-Ying and Wang, Zhangyang and Liu, Qiang and Ranjan, Rakesh},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}