A Course on Numerical Linear Algebra¶

Instructor: Deniz Bilman

Textbook: Numerical Linear Algebra, by Trefethen and Bau.

Lectures 14–15: Stability¶

Recall the convention in lecture notes: $\frac{1}{2}\epsilon_\text{M}$ in these notes corresponds to the machine epsilon $\epsilon_{\text{machine}}$ in Trefethen & Bau.

We denoted a problem by $f\colon X \to Y$.

We denote an algorithm to solve this problem by $\tilde{f}\colon X \to Y$.

Suppose that a computer has the property:

For all $x, y \in \mathbb{F}$, there exists $\epsilon$ with $|\epsilon| \leq \frac{1}{2}\epsilon_{\text{M}}$ such that

$$ x \circledast y=(x * y)(1+\epsilon) . $$

In words, every operation of floating point arithmetic is exact up to a relative error of size at most $\frac{1}{2}\epsilon_{\text{M}}$.

Given data $x \in X$, let this data be rounded to floating point and then supplied as input to the computer program. Now, run the program. The result is a collection of floating point numbers that belong to the vector space $Y$. Let this computed result be called $\tilde{f}(x)$.

There are a number of things that may introduce errors in $\tilde{f}(x)$.

Big-Oh Notation¶

We say $g(z)=O(h(z))$ as $z\to z_0$ if there exists constants $M>0$ and $\delta>0$ such that for all $z$ satisfying $0<|z-z_0|<\delta$, we have $$ |g(z)| \leq M |h(z)|. $$

Example: As $\epsilon \to 0$, we have $\sin(\epsilon) = O(1)$. We also have $\sin(\epsilon) = O(\epsilon)$. But $\sin(\epsilon) \neq O(\epsilon^2)$, for example.

Accuracy¶

In general $\tilde{f}$ cannot be continuous. But we expect a good algorithm to approximate the associated problem $f$.

We define the absolute error of a computation by

$$ \| \tilde{f}(x) - f(x) \|. $$

We define the relative error of a computation by

$$ \frac{\| \tilde{f}(x) - f(x) \|}{\|f(x)\|}. $$

As mentioned in the previous lecture, our main concern will be the relative quantities. So we will work with the relative error.

We may say that an algorithm $\tilde{f}$ for a problem $f$ is accurate if for each $x\in X$,

$$ \frac{\| \tilde{f}(x) - f(x) \|}{\|f(x)\|} = O(\epsilon_{\text{M}}). $$

If $f(x)=0$, then we use

$$ \| \tilde{f}(x) - f(x) \| = O(\epsilon_{\text{M}}) \| f(x)\|. $$

If the problem $f$ at hand is ill-conditined it is unreasonable to expect a property like above holding for all $x\in X$. Rounding of the input data is unavoidable on a digital computer, and even if all the subsequent computations could be carried out perfectly, this perturbation alone might lead to a significant change in the result. Instead of aiming for accuracy in all cases, we introduce the following notion, which is the appropriate aim in these situations.

Stability¶

We say that an algorithm $\tilde{f}$ for a problem $f$ is (forward) stable if for fixed each $x\in X$,

$$ \frac{\|\tilde{f}(x)-f(\tilde{x})\|}{\|f(\tilde{x})\|}=O(\epsilon_\text{M}) $$

for some $\tilde{x}$ with

$$ \frac{\|\tilde{x}-x\|}{\|x\|}= O(\epsilon_\text{M}). $$

A stable algorithm gives nearly the right answer $\tilde{f}(x)$ to nearly the right question $f(\tilde{x})$.

In problems arising in different contexts (such as solving PDEs), $O(\epsilon_\text{M})$ may be too restrictive.

We say that an algorithm $\tilde{f}$ for a problem $f$ is backward stable if for each $x$

$$ \tilde{f}(x)= f(\tilde{x}) $$

for some $\tilde{x}$ with $\dfrac{\|x-\tilde{x}\|}{\|x\|}=O(\epsilon_\text{M})$.

A backward stable algorithm gives exactly the right answer to nearly the right question.

This means that there is a perturbation of the input of the problem $f$ that results in exactly the output of $\tilde{f}$.

Recall that in our finite dimensional setting all norms are equivalent. Therefore, the definitions above involving $O(\epsilon_\text{M})$ are independent of the choice of norm --- the constants produced by the equivalency of the norms are absorbed in the big-Oh.

Example: Backward Stability of Subtraction

The problem of subtraction is computing $f(x_1, x_2)=x_1-x_2$. The algorithm we consider can be written as

$$ \tilde{f}(x_1, x_2)=\mathrm{fl}(x_1) \ominus \mathrm{fl}(x_2). $$

From Lecture 13, we know that

$$ \mathrm{fl}(x_1)=x_1(1+\epsilon_1), \quad \mathrm{fl}(x_2)=x_2(1+\epsilon_2) $$

for some $\left|\epsilon_1\right|,\left|\epsilon_2\right| \leq \frac{1}{2}\epsilon_{\text{M}}$. We also have that

$$ \mathrm{fl}\left(x_1\right) \ominus \mathrm{fl}\left(x_2\right)=\left(\mathrm{fl}\left(x_1\right)-\mathrm{fl}\left(x_2\right)\right)\left(1+\epsilon_3\right) $$

for some $|\epsilon_3| \leq \frac{1}{2}\epsilon_{\text{M}}$. Combining these result in

$$ \begin{aligned} \mathrm{fl}\left(x_1\right) \ominus \mathrm{fl}\left(x_2\right) & =\left[x_1\left(1+\epsilon_1\right)-x_2\left(1+\epsilon_2\right)\right]\left(1+\epsilon_3\right) \\ & =x_1\left(1+\epsilon_1\right)\left(1+\epsilon_3\right)-x_2\left(1+\epsilon_2\right)\left(1+\epsilon_3\right) \\ & =x_1\left(1+\epsilon_4\right)-x_2\left(1+\epsilon_5\right) \end{aligned} $$

for some $|\epsilon_4|, |\epsilon_5| \leq 2 \epsilon_\text{M}+ O(\epsilon_\text{M}^2)$.

Now, observe that there exists $\tilde{x}_1 = x_1\left(1+\epsilon_4\right)$ and $\tilde{x}_2 = x_2\left(1+\epsilon_5\right)$ such that

$$ \frac{\| x - \tilde{x} \|}{\|x\|} = \frac{\| (\epsilon_4 x_1, \epsilon_5 x_2)\|}{\| x\|} \leq \max\{\epsilon_4,\epsilon_5\} \frac{\|x\|}{\|x\|} \leq \frac{1}{2}\epsilon_\text{M}. $$

You see? The computed result $\tilde{f}(x_1,x_2)=\mathrm{fl}\left(x_1\right) \ominus \mathrm{fl}\left(x_2\right)$ is exactly equal to the difference $f(\tilde{x}_1,\tilde{x}_2) = \tilde{x}_1- \tilde{x}_2$.

Importantly, the axioms we stated in Lecture 13 for floating point arithmetic imply that all of the floating point operations $\oplus$, $\ominus$, $\otimes$, and $\oslash$ are backward stable.

Example: Backward Stability of Division

We can proceed similarly:

$$ \mathrm{fl}(x_1) \oslash \mathrm{fl}(x_2) = \frac{x_1(1+\epsilon_1)}{x_2(1+\epsilon_2)}(1+\epsilon_3) = \frac{x_1(1+\epsilon_1)(1+\epsilon_3)}{x_2(1+\epsilon_2)} = \frac{x_1(1+\epsilon_1 + \epsilon_3 + \epsilon_1 \epsilon_3)}{x_2(1+\epsilon_2)} $$

for some $|\epsilon_1|,|\epsilon_2|,|\epsilon_3|\leq \frac{1}{2}\epsilon_\text{M}$.

Then for

$$ \tilde{x} = \begin{bmatrix} x_1(1+\epsilon_1 + \epsilon_3 + \epsilon_1 \epsilon_3) \\ x_2(1+\epsilon_2)\end{bmatrix}, $$

we see that

$$ \tilde{x} - x = \begin{bmatrix} x_1 (\epsilon_1 + \epsilon_3 + \epsilon_1 \epsilon_3) \\ x_2 \epsilon_2 \end{bmatrix} $$

and we have

$$ \frac{\|\vec{x}-x\|}{\|x\|}=O(\epsilon_{\text {M}}) $$

such that $\tilde{f}(x) = f(\tilde{x})$.

Further Examples¶

Inner Product. Backward stable.

Outer Product. Forget it. The goal is to compute $A=x y^*$ for vectors $x\in\mathbb{C}^m$ and $y\in\mathbb{C}^n$. The obvious algorithm is to compute the pairwise products with $\otimes$ and collect them into a matrix $\tilde{A}$. This algorithm is (forward) stable, but it is not backward stable.

The reason is that the matrix $\tilde{A}$ is very unlikely to have rank exactly 1. Therefore, it cannot generally be written in the form $(x+\delta x)(y+\delta y)^*$.

Adding $1$ to a Number. Given $x\in\mathbb{C}$, consider the problem of computing $x+1$ with $\oplus$. This algorithm is stable, but not backward stable.

Note that $1$ is exactly represented in the floating point system. Therefore,

$$ \tilde{f}(x) = \mathrm{fl}(x)\oplus 1 = \left (x(1+\epsilon_1)+1 \right)(1+\epsilon_2) $$

for some $|\epsilon_1|, |\epsilon_2|\leq \frac{1}{2}\epsilon_\text{M}$. Then

$$ \tilde{f}(x) = \mathrm{fl}(x)\oplus 1 = x(1+\epsilon_1 + \epsilon_2 + \epsilon_1 \epsilon_2) +\epsilon_2 + 1 $$

For $\tilde{x}=x(1+\epsilon_1 + \epsilon_2 + \epsilon_1 \epsilon_2) +\epsilon_2$, one would have $\tilde{f}(x) = f(\tilde{x})$.

However,

$$ \frac{\|\tilde{x}-x\|}{\|x\|}=\frac{\left\|x\left(\varepsilon_1+\varepsilon_2+\varepsilon_1 \varepsilon_2\right)+\varepsilon_2\right\|}{\|x\|} $$

and this becomes large for $x\approx 0$ ($\epsilon_2$ is fixed).

This example indicates that backward stability is a rather special property.

Note that if the problem had been to compute $x+y$ for data $x$ and $y$, then the algorithm would have been backward stable.

Example: Another Failure of Backward Stability

Let $h\colon \mathbb{C}\to \mathbb{C}$ be a function and suppose that $h'(z)=0$ for some $z$. Consider the problem of evaluating (computing) $h(z)$.

Show that this is not backward stable (use the Mean Value Theorem).

An Unstable Algorithm¶

We leave the toy examples above aside.

Consider the problem of computing the eigenvalues of a given square matrix $A$. Since eigenvalues are the roots of the characteristic polynomial $p(z) = \det(z\mathbb{I}-A)$, it seems like one can try to find the roots of this polynomial to obtain the eigenvalues. This suggests the following method for computing the eigenvalues:

Find the coefficients of the characteristic polynomial,
Find its roots.

This algorithm is not only backward unstable, but unstable. It should not be used. Even in cases where extracting eigenvalues is a well-conditioned problem, it may produce answers that have relative errors vastly larger than $\epsilon_{\text{M}}$.

When Does Backward Stability Imply Accuracy? Does It?¶

Theorem:¶

Suppose a backward stable algorithm is applied to solve a problem $f: X \rightarrow Y$ with condition number $\kappa$ on a computer satisfying the axioms in Lecture 13 (See (13.5) and (13.7) from the book). Then the relative errors satisfy

$$ \frac{\|\tilde{f}(x)-f(x)\|}{\|f(x)\|}=O\left(\kappa(x) \epsilon_{\text{M}}\right). $$

Proof:¶

Since the algorithm is backward stable, we know that $\tilde{f}(x)=f(\tilde{x})$ for some $\tilde{x}\in X$ satisfying

$$ \frac{\|\tilde{x}-x\|}{\|x\|}=O\left(\epsilon_{\text{M}}\right) . $$

On the other hand, by definition of the (relative) condition number:

$$ \kappa(x) \geq \displaystyle \frac{\displaystyle\frac{\|f(x+\delta x)-f(x)\|}{\|f(x)\|}}{\displaystyle\frac{\|\delta x\|}{\| x\|}}. $$

More precisely,

$$ \displaystyle \frac{\displaystyle\frac{\|f(x+\delta x)-f(x)\|}{\|f(x)\|}}{\displaystyle\frac{\|\delta x\|}{\| x\|}} \leq \kappa(x) + o(1), $$

where $o(1)$ denotes a quantity that converges to zero as $\epsilon_\text{M}\to 0$.

Or, with $\delta x$ chosen such that $\tilde{x}=x+\delta x$,

$$ \displaystyle \frac{\displaystyle\frac{\|f(\tilde{x})-f(x)\|}{\|f(x)\|}}{\displaystyle\frac{\|x-\tilde{x}\|}{\| x\|}} \leq \kappa(x) + o(1) $$

and this implies that

$$ \frac{\|\tilde{f}(x)-f(x)\|}{\|f(x)\|} \leq(\kappa(x)+o(1)) \frac{\|\tilde{x}-x\|}{\|x\|} = O\left(\kappa(x) \epsilon_{\text{M}}\right). $$

This shows the following. Suppose we have a backward stable algorithm $\tilde{f}$ for a problem $f$ that has a condition number $\kappa(x)$. If $\kappa(x)$ is small, the results will be accurate in the relative sense. But if $\kappa(x)$ is large, the accuracy will suffer proportionately.

So a well-conditioned problem with a backward stable algorithm implies accuracy!

Concluding Remarks from Trefethen & Bau¶

The process we have just carried out in proving the Theorem abive is known as backward error analysis.

We obtained an accuracy estimate by two steps.

One step was to investigate the condition of the problem.

The other step was to investigate the stability of the algorithm.

Our conclusion was that if the algorithm is stable, then the final accuracy reflects that condition number.

In general, the first idea that comes to mind in analyzing an algorithm would be forward error analysis. The first idea would be forward error analysis. Here, the rounding errors introduced at each step of the calculation are estimated, and somehow, a total is maintained of how they may compound from step to step.

This is in practice harder to carry out.

In short, it is an established fact that the best algorithms for most problems do no better, in general, than to compute exact solutions for slightly perturbed data. Backward error analysis is a method of reasoning fitted neatly to this backward reality.