A Course on Numerical Linear Algebra¶
Instructor: Deniz Bilman
Textbook: Numerical Linear Algebra, by Trefethen and Bau.
Lecture 22: Stability of Gaussian Elimination¶
Quoting Trefethen & Bau: "Gaussian elimination with partial pivoting is explosively unstable for certain matrices, yet stable in practice."
This seemingly strange situation has a statistical explanation.
The main idea is the following. By a textbook definition, Gaussian elimination with partial pivoting (PALU factorization, $PA=LU$) is backward stable. However, there is a growth factor $\rho$ involved, which may be huge, on the order of 2^{m-1} for an $m\times m$ matrix. Although one has a uniform bound, $O(\varepsilon_{\mathrm{M}})$ in the backward stability result, the constant depends on $\rho$, which may overwhelm the machine epsilon and the actual quantity can easily get out of hand.
Where is $\rho$ coming from?
Each pivot selection involves maximization over a column, this algorithm produces a matrix $L$ with entries of absolute value $\leq 1$ everywhere below the diagonal. This implies $\|L\|=O(1)$ in any norm. Therefore, for Gaussian elimination with partial pivoting reduces to the condition $\|\delta A\| /\|U\|=O(\epsilon_{\text {M}})$. We conclude that the algorithm is backward stable provided $\|U\|=O(\|A\|)$.
There is a standard reformulation of this conclusion that is perhaps more vivid.
Gaussian elimination reduces a full matrix $A$ to an upper-triangular matrix $U$. We have just seen that the key question for stability is whether amplification of the entries takes place during this reduction. In particular, let the growth factor for $A$ be defined as the ratio
$$ \rho=\dfrac{\max_{i, j}\left|U_{i j}\right|}{\max_{i, j}\left|A_{i j}\right|}. $$If $\rho$ is of order 1 , not much growth has taken place, and the elimination process is stable. If $\rho$ is bigger than this, we must expect instability. Specifically, since $\|L\|=O(1)$, and then $\|U\|=O(\rho\|A\|)$.
We have the following theorem.
Theorem¶
Let the factorization $P A=L U$ of a matrix $A \in \mathbb{C}^{m \times m}$ be computed by Gaussian elimination with partial pivoting (Algorithm 21.1) on a computer satisfying the axioms A and B from Lecture 13. Then the computed matrices $\tilde{P}, \tilde{L}$, and $\tilde{U}$ satisfy
$$ \tilde{L} \tilde{U}=\tilde{P} A+\delta A, \quad \frac{\|\delta A\|}{\|A\|}=O(\rho \epsilon_{\text {M }}) $$for some $\delta A \in \mathbb{C}^{m \times n}$, where $\rho$ is the growth factor for $A$. If $\left|\ell_{i j}\right|<1$ for each $i>j$, implying that there are no ties in the selection of pivots in exact arithmetic, then $\tilde{P}=P$ for all sufficiently small $\epsilon_{\text {M }}$.
A Worst Case¶
using LinearAlgebra
m = 8;
A = tril(-ones(m,m),-1) + Matrix{Float64}(I, m, m);
A[:,m] .= 1;
A
8×8 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.0 -1.0 1.0 0.0 0.0 0.0 0.0 1.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 0.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 0.0 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
(L,U) = lu(A)
LU{Float64, Matrix{Float64}, Vector{Int64}} L factor: 8×8 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 -1.0 1.0 0.0 0.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 0.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 U factor: 8×8 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 16.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 32.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 64.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 128.0
Since $m=8$, we see the factor $2^{m-1}= 2^7 = 128$ of amplification for an entry of $U$.
Now, the machine epsilon is $2^{-52}$. So, if $m>50$, solving $Ax=b$ via PALU factorization (Gaussian elimination with partial pivoting) will be quite problematic!
# Let's try!
m = 53
A = tril(-ones(m,m),-1) + Matrix{Float64}(I, m, m);
A[:,m] .= 1;
# Define the true solution
x = (1:m)/m
# x = collect((1:m)/m)
b = A*x
norm(x - A\b)/norm(x)
0.01686801433899681
This is computation is performed with 16 digits arithmetic and the resulting error is 1%! Here we remind ourselves that the default backslash \
operator used Gaussian elimination with partial pivoting.
What happens for larger $m$?
# Let's try!
m = 56
A = tril(-ones(m,m),-1) + Matrix{Float64}(I, m, m);
A[:,m] .= 1;
# Define the true solution
x = (1:m)/m
# x = collect((1:m)/m)
b = A*x
norm(x - A\b)/norm(x)
0.3157400680202919
30% error! There is nothing wrong with the condition number of the matrix, by the way:
cond(A)
25.00425603327097
Accordingly, the QR factorization (with Householder by default) perfectly takes care of $Ax=b$.
(Q,R) = qr(A);
# QR = A --> QR x = b is being solved.
# Solve Rx = (Q^*)b
norm(x - R\(Q'*b)) / norm(x)
5.46420983563484e-15
Natural question is why PALU algorithm is chosen as the default for solving linear systems while the QR algorithm is fully reliable and waiting in line...
The matrices that cause this type of worst-case scenarios are known to be rare. So there is "Stability in Practice". Trefethen & Bau states:
In fifty years of computing, no matrix problems that excite an explosive instability are known to have arisen under natural circumstances.
The factor of 2 in the number of flops involved in QR vs PALU factorization is certainly a reason.
Thinking more about it, since
$$ P A = LU \implies U = L^{-1} P A $$and $P$ and $A$ have moderate entries, $U$ having very large entries (such as exponentially large) imply that $L^{-1}$ has very large entries.
We know that $L$ has a diagonal of $1$s and all the subdiagonal entries of with absolute values $\leq 1$. Interestingly, random triangular matrices have huge inverses. Moreover, random matrices of the form delivered by Gaussian elimination are known to have inverses that may have very large entries.
So, how come $L^{-1}$ is very rarely large? There must be a correlation that's induced by the Gaussian elimination so that $L$ is no longer random even when we apply PALU to a random matrix $A$.
More will be said on this soon!
From Trefethen & Bau:
The answer lies in the consideration of column spaces. Since $U$ is uppertriangular and $P A=L U$, the column spaces of $P A$ and $L$ are the same. By this we mean that the first column of $P A$ spans the same space as the first column of $L$, the first two columns of $P A$ span the same space as the first two columns of $L$, and so on. If $A$ is random, its column spaces are randomly oriented, and it follows that the same must be true of the column spaces of $P^{-1} L$. However, this condition is incompatible with $L^{-1}$ being large. It can be shown that if $L^{-1}$ is large, then the column spaces of $L$, or of any permutation $P^{-1} L$, must be skewed in a fashion that is very far from random.
Read the numerical evidence for this from Lecture 23 in the book.