TL;DR We theoretically investigate density control in 3DGS. As training via gradient descent progresses, many Gaussian primitives are observed to become stationary while failing to reconstruct the regions they cover.
From an optimization-theoretic perspective, we reveal that these primitives are trapped in saddle points, the regions in the loss landscape where gradients are insufficient to further reduce loss, leaving parameters sub-optimal locally.
To address this, our SteepGS efficiently identifies Gaussian points located in saddle area, splits them into two off-springs, and displaces new primitives along the steepest descent directions.
This restores the effectiveness of successive gradient-based updates by escaping the saddle area. The figure on the right demonstrates this process.
To fully understand the underlying mechanism, we begin with a simple mathematical setup. Suppose the scene is represented by a single Gaussian function, $\theta = (p, \Sigma, o)$ — omitting color for simplicity — defined as $\sigma(x; \theta) = o \exp\left(-\frac{1}{2}(x - p)^\top \Sigma (x - p)\right)$. We consider splitting this Gaussian into $m$ offspring, where each offspring is assigned new parameters $\vartheta = \{ \vartheta_j \}$ and a scaled opacity $w = \{ w_j \}$, for every $j = 1, \dots, m$. Our questions convert to finding the values of $m$, $w$ and $\delta$.
Let $\mathcal{L}(\theta)$ and $\mathcal{L}(\vartheta, w)$ denote the photometric loss before and after splitting, respectively. First of all, we prove the following approximation to characterize the impact of densification on loss:
$$
\mathcal{L}(\vartheta, w) \approx \textcolor{blue}{\underbrace{\mathcal{L}(\theta) + \nabla \mathcal{L}(\theta)^\top \mu + \frac{1}{2} \mu^\top \nabla^2 \mathcal{L}(\theta) \mu}_{\Gamma}} + \textcolor{red}{\underbrace{\frac{1}{2} \sum_{j=1}^{m} w_j \delta_j^\top S(\theta) \delta_j}_{\Delta}},
$$
where $\mu = \sum_j w_j \vartheta_j - \theta$ is the mean displacement of the offspring, $\delta_j = \vartheta_j - \theta - \mu$ is the deviation of each offspring from the mean, and $S(\theta)$ is named
splitting matrix, being central to quantify the effect of splitting a Gaussian:
$$
S(\theta) = \mathbb{E}_{x} \left[\frac{\partial \mathcal{L}}{\partial \sigma(x; \theta)} \nabla^2_{\theta} \sigma(x; \theta) \right].
$$
By examining the loss after splitting $\mathcal{L}(\vartheta, w)$, we can draw the following conclusions:
- Why to split Gaussian? The term $\textcolor{blue}{\Gamma}$ can be optimized via gradient descent. However, when a Gaussian becomes trapped in a saddle point, gradients alone may provide no effective update. In contrast, splitting introduces the additional term $\textcolor{red}{\Delta}$, which can decrease the loss and steer the parameters away from the saddle.
- When to split Gaussian? For splitting to effectively reduce the loss, the term $\textcolor{red}{\Delta}$ must be negative. Take a closer look, $\textcolor{red}{\Delta}$ is a quadratic functon w.r.t. splitting matrix $S(\theta)$. A necessary condition for $\textcolor{red}{\Delta}$ being negative is that $S(\theta)$ must NOT be positive semi-definite; otherwise, splitting can backfire!
- How to split Gaussian? More is true. The density control strategy with steepest descent on loss can be given analytically. Based on the property of quatratic functions, setting $m = 2$ with $\delta_1 = v_{\min}(S(\theta))$ and $\delta_2 = -v_{\min}(S(\theta))$ is sufficient to achieve a minimizer of $\textcolor{red}{\Delta}$, where $v_{\min}(S(\theta))$ is the least eigenvector of splitting matrix $S(\theta)$.
All these results generalize well to the scenario with multiple Gaussians. Please see our paper for more details.