A Course on Numerical Linear Algebra¶
Instructor: Deniz Bilman
Textbook: Numerical Linear Algebra, by Trefethen and Bau.
Lecture 17: Stability of Backward Substitution¶
Recall the convention in lecture notes: $\frac{1}{2}\epsilon_\text{M}$ in these notes corresponds to the machine epsilon $\epsilon_{\text{machine}}$ in Trefethen & Bau.
Triangular Systems¶
We have seen that a system of equations $Ax=b$ can be reduced to an upper triangular system $R x = y$ by QR factorization. And such an upper triangular system can be easily solved by backward substitution (or back substitution).
The situation at hand looks like
$$ \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1 m} \\ & r_{22} & & \\ & & \ddots & \vdots \\ & & & r_{m m} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix} $$for $r_{jj}\neq 0$, $j=1,\ldots,m$, and $x\in\mathbb{C}^m$ is unknown.
We can solve this by starting at the bottom, solve for $x_m$, and substitute that in the next equation from the bottom to solve for $x_{m-1}$, and work our way up until we find $x_1$.
Algorithm 17.1: Back Substitution
$$ \begin{aligned} x_m & =b_m / r_{m m} \\ x_{m-1} & =\left(b_{m-1}-x_m r_{m-1, m}\right) / r_{m-1, m-1} \\ x_{m-2} & =\left(b_{m-2}-x_{m-1} r_{m-2, m-1}-x_m r_{m-2, m}\right) / r_{m-2, m-2} \\ & \vdots \\ x_j & =\left(b_j-\sum_{k=j+1}^m x_k r_{j k}\right) / r_{j j} \end{aligned} $$Each step consists of subtraction and multiplication. The operation count is roughly $m^2$ flops.
We assume that the subtractions are carried out from left to right.
Theorem:¶
Let Algorithm 17.1 be applied to a nonsingular upper-triangular problem $Rx = b$ consisting of floating point numbers on a computer satisfying the Axiom (13.7) (Axiom B in our Lecture 14-15). This algorithm is backward stable in the sense that the computed solution $\tilde{x} \in \mathbb{C}^m$ satisfies
$$ (R+\delta R) \tilde{x}=b $$for some upper-triangular $\delta R \in \mathbb{C}^{m \times m}$ with
$$ \frac{\|\delta R\|}{\|R\|}=O\left(\epsilon_{\text{M}}\right) . $$Specifically, for each $i$, $j$,
$$ \frac{\left|\delta r_{i j}\right|}{\left|r_{i j}\right|} \leq m \frac{1}{2}\epsilon_{\text{M}}+O(\epsilon_{\text{M}}^2). $$Proof:¶
We prove this theorem by induction on $m$.
When $m=1$ we have
$$ x_1 = b_1 \oslash r_1 = \frac{b_1}{r_1}(1+\epsilon) $$for some $\epsilon$ with $|\epsilon|\leq \frac{1}{2}\epsilon_\text{M}$. Next,
$$ \frac{b_1}{r_1}(1+\epsilon) = \frac{b_1}{r_1 (1+\epsilon)^{-1}}. $$Since
$$ \frac{1}{1-s} = 1 + s + O(s^2),\quad s\to 0, $$we have
$$ r_1 (1+\epsilon)^{-1} = r_1(1 + O(\epsilon)) = r_1(1 + O(\epsilon_\text{M})) =: R+ \delta R. $$This completes the basis step for the induction.
Now, suppose that $R$ is $m\times m$ nonsingular and upper-triangular, and we have
$$ (R+\delta R)\tilde{x} = b $$for some $\delta R\in\mathbb{C}^m$ satisfying $\frac{\|\delta R\|}{\|R\|}=O(\epsilon_{\text {M }})$. This is the bottom of an $(m+1)\times (m+1)$ system and we want to solve
$$ \begin{bmatrix} r_{11} & \mathbf{r}^* \\ 0 & R \end{bmatrix} \begin{bmatrix} x_1 \\ \mathbf{x} \end{bmatrix} = \begin{bmatrix} b_1 \\ \mathbf{b} \end{bmatrix}, $$where
$$ \mathbf{x} = \begin{bmatrix} x_2 \\ x_3 \\ x_4 \\ \vdots\\ x_m \end{bmatrix} \quad\text{and} \quad \mathbf{b} = \begin{bmatrix} b_2 \\ b_3 \\ b_4 \\ \vdots\\ b_m \end{bmatrix} $$and the row vector $\mathbf{r}^*$ is given: $$ \mathbf{r}^* = \begin{bmatrix} r_{12} & r_{13} & \cdots & r_{1 m} \end{bmatrix}. $$
So here $\tilde{x}$ is the computed solution for $\mathbf{x}$.
At this stage, we employ the final step of the backward substituion to obtain
$$ \tilde{x}_1 =\left[ ( b_1 \ominus (\tilde{x}_2 \otimes r_{12}) ) \ominus (\tilde{x}_3 \otimes r_{13}) \ominus \cdots \ominus (\tilde{x}_m \otimes r_{1m}))\right] \oslash r_{11}. $$We convert the $\otimes$ operations into mathematical subtractions by introducing perturbations and obtain
$$ \tilde{x}_1 =\left[ ( b_1 \ominus \tilde{x}_2 r_{12}(1+\epsilon_2) ) \ominus \tilde{x}_3 r_{13}(1+\epsilon_3) \ominus \cdots \ominus \tilde{x}_m r_{1m}(1+\epsilon_m)\right] \oslash r_{11} $$for some $\epsilon_j$ with $|\epsilon_j|\leq \frac{1}{2}\epsilon_\text{M}$, for $j=2,\ldots,m$.
Now note the simple identity:
$$ 1 + \frac{-\epsilon}{1+\epsilon} = \frac{1}{1+\epsilon}. $$Taking reciprocals, we have
$$ 1+\epsilon = \frac{1}{1+\hat{\epsilon}},\qquad\text{with}~\hat{\epsilon} := \frac{-\epsilon}{1+\epsilon} $$and we note that as $\epsilon\to 0$, we have
$$ \hat{\epsilon} = -\epsilon(1 + O(\epsilon)) = O(\epsilon), $$which implies that
$$ \frac{1}{1+\hat{\epsilon}} = 1 + O(\epsilon) $$as well.
Next, we convert the $\ominus$ operations into mathematical subtractions (successively) by introducing relative perturbations of size less than $\frac{1}{2}\epsilon_\text{M}$ and then shift them to the denominator using the identity above:
$$ b_1 \ominus \tilde{x}_2 r_{12}(1+\epsilon_2) = \left(b_1 - \tilde{x}_2 r_{12}(1+\epsilon_2)\right)(1+\epsilon_2') = \frac{b_1 - \tilde{x}_2 r_{12}(1+\epsilon_2)}{1+\hat{\epsilon}_2}=:\frac{b_1 - d_2}{1+\hat{\epsilon}_2}, $$$$ \frac{b_1 - d_2}{1+\hat{\epsilon}_2} \ominus \tilde{x}_3 r_{13}(1+\epsilon_3) = \left[\frac{b_1 - d_2 - \tilde{x}_3 r_{13}(1+\hat{\epsilon}_2)(1+\epsilon_3)}{1+\hat{\epsilon}_2}\right](1+\epsilon_3')= \frac{b_1 - d_2 - \tilde{x}_3 r_{13}(1+\hat{\epsilon}_2)(1+\epsilon_3)}{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)}=:\frac{b_1 - d_2 - d_3}{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)} $$$$ \vdots $$$$ \begin{aligned} \frac{b_1 - d_2 - d_3 - \cdots - d_{m-1}}{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})} \ominus \tilde{x}_m r_{1m}(1+\epsilon_m) &= \left(\frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - \tilde{x}_m r_{1m}(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\epsilon_m) }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})} \right)(1+\epsilon_m') \\ &= \frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - \tilde{x}_m r_{1m}(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\epsilon_m) }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)}\\ &=:\frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - d_{m} }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)} \end{aligned} $$Note that these calculations result in
$$ \tilde{x}_1 = \left[ \frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - d_{m} }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)} \right]\oslash r_{11} $$We now convert the flop $\oslash$ to mathematical division by introducing a relative perturbation of size less than $\frac{1}{2}\epsilon_\text{M}$ and obtain
$$ \begin{aligned} \tilde{x}_1 &= \left[ \frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - d_{m} }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)r_{11}} \right](1+\epsilon_{m+1}')\\ &= \left[ \frac{b_1 - d_2 - d_3 - \cdots - d_{m-1} - d_{m} }{(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)(1+\hat{\epsilon}_{m+1})r_{11}} \right] \end{aligned} $$Finally, we observe that for each intermediate quantity $d_k$ we have introduced, we have
$$ d_k = \tilde{x}_k r_{1k}(1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{k-1})(1+\epsilon_k) = \tilde{x}_k r_{1k}(1+ O(\epsilon_\text{M})) $$and also the quantity we ended up with in the denominator
$$ (1+\hat{\epsilon}_2)(1+\hat{\epsilon}_3)\cdots (1+\hat{\epsilon}_{m-1})(1+\hat{\epsilon}_m)(1+\hat{\epsilon}_{m+1})r_{11} = r_{11}(1+ O(\epsilon_\text{M})). $$Therefore,
$$ \tilde{x}_1 = \frac{\left[b_1 - \tilde{x}_2 r_{12}(1+ O(\epsilon_\text{M})) - \tilde{x}_3 r_{13}(1+ O(\epsilon_\text{M})) - \cdots - \tilde{x}_m r_{1m}(1+ O(\epsilon_\text{M}))\right]}{r_{11}(1+ O(\epsilon_\text{M}))}. $$We see that all perturbations land on $r_{1k}$, so we see that for some perturbation $\delta\mathbf{r}^*$ of size $\dfrac{\|\delta \mathbf{r}^*\|}{\| \mathbf{r}^*\|} = O(\epsilon_\text{M})$, the entries of the perturbed row vector
$$ \tilde{\mathbf{r}}^* = \mathbf{r}^* + \delta \mathbf{r}^* = [\tilde{r}_{12} \quad \tilde{r}_{13}\quad \cdots \quad \tilde{r}_{1m}] $$exactly satisfy
$$ \tilde{x}_1 = \frac{b_1 - \tilde{x}_2 \tilde{r}_{12} - \tilde{x}_3 \tilde{r}_{13} - \cdots - \tilde{x}_m \tilde{r}_{1m}}{\tilde{r}_{11}}. $$Since we have the assumption
$$ (R+\delta R) \begin{bmatrix} \tilde{x}_2 \\ \tilde{x}_3 \\ \vdots \\ \tilde{x}_m \end{bmatrix}= \begin{bmatrix} b_2 \\ b_3 \\ \vdots \\ b_m \end{bmatrix} $$for $\delta R$ satisfying $\frac{\|\delta R\|}{\| R\|} = O(\epsilon_\text{M})$, we arrive at the fact that
$$ \begin{bmatrix} r_{11} + \delta r_{11} & \mathbf{r}^* + \delta \tilde{\mathbf{r}}^* \\ 0 & R+\delta R \end{bmatrix} \begin{bmatrix} \tilde{x}_1 \\ \tilde{x}_2 \\ \tilde{x}_3 \\ \vdots \\ \tilde{x}_m \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_m \end{bmatrix} $$holds exactly. This finishes the proof.