A Course on Numerical Linear Algebra¶
Instructor: Deniz Bilman
Textbook: Numerical Linear Algebra, by Trefethen and Bau.
Lecture 23: Cholesky Factorization¶
It is a minor delight when we are dealing with a system $Ax =b$, where $A$ is a Hermitian (or symmetric, if real) positive definite matrix $A$.
Hermitian positive definite matrices can be decomposed into triangular factors twice as quickly as general matrices. The standard algorithm for this is Cholesky factorization. It operates on the matrix both from left and right at once, making use of the symmetry.
Hermitian Positive Definite Matrices¶
We list a bunch of important and useful properties.
A real matrix $A \in \mathbb{R}^{m \times m}$ is symmetric if $A_{i j}=A_{j i}$ for all $i, j$, hence $A=A^T$. Such a matrix satisfies $x^T A y=y^T A x$ for all vectors $x, y \in \mathbb{R}^m$.
For a complex matrix $A \in \mathbb{C}^{m \times m}$, the analogous property is that $A$ is Hermitian: $A=A^*$.
A Hermitian matrix $A$ satisfies $x^* A y=\overline{y^* A x}$ for all $x, y \in \mathbb{C}^m$. This means in particular that for any $x \in \mathbb{C}^m, x^* A x$ is real. If in addition $x^* A x>0$ for all $x \neq 0$, then $A$ is said to be Hermitian positive definite (or sometimes just positive definite). Many matrices that arise in physical systems are Hermitian positive definite because of fundamental physical laws.
If $A$ is an $m \times m$ Hermitian positive definite matrix and $X$ is an $m \times n$ matrix of full rank with $m \geq n$, then the matrix $X^* A X$ is also Hermitian positive definite.
Exercise: Show that X^* A X$ defined in this way is indeed positive definite.
Fact: A Hermimitian matrix is postivie definite if and only if all of its eigenvalues are positive.
Eigenvectors that correspond to distinct eigenvalues of a Hermitian matrix are orthogonal. (Hermitian matrices are normal.)
To see this, suppose $A x_1=\lambda_1 x_1$ and $A x_2=\lambda_2 x_2$ with $\lambda_1 \neq \lambda_2$. Then
$$ \lambda_2 x_1^* x_2=x_1^* A x_2=\overline{x_2^* A x_1}=\overline{\lambda_1 x_2^* x_1}=\lambda_1 x_1^* x_2 $$
so $\left(\lambda_1-\lambda_2\right) x_1^* x_2=0$. Since $\lambda_1 \neq \lambda_2$, we have $x_1^* x_2=0$.
Symmetric Gaussian Elimination¶
Again the goal is to factor a HPD (Hermitian positive definite) matrix into triangular factors.
Idea: When the subdiagonal of column $k$ is transformed to zeros at step $k$, also transform the $k$th row's remaining entries to zeros to preserve the symmetry.
Applying Gaussian elimination to the following Hermitian matrix:
$$ A=\begin{bmatrix} 1 & w^* \\ w & K \end{bmatrix}. $$
We obtain
$$ A=\begin{bmatrix} 1 & w^* \\ w & K \end{bmatrix} =\begin{bmatrix} 1 & 0 \\ w & I \end{bmatrix}\begin{bmatrix} 1 & w^* \\ 0 & K-w w^* \end{bmatrix} $$
Before proceeding, Cholesky factorization now introduces zeros in the first row to match the zeros just introduced in the first column. We can do this by a right upper-triangular operation that subtracts multiples of the first column from the subsequent ones:
$$ \begin{bmatrix} 1 & w^* \\ 0 & K-w w^* \end{bmatrix} \begin{bmatrix} 1 & -w^* \\ 0 & I \end{bmatrix} =\begin{bmatrix} 1 & 0 \\ 0 & K-w w^* \end{bmatrix}\implies \begin{bmatrix} 1 & w^* \\ 0 & K-w w^* \end{bmatrix}=\begin{bmatrix} 1 & 0 \\ 0 & K-w w^* \end{bmatrix}\begin{bmatrix} 1 & w^* \\ 0 & I \end{bmatrix} $$
Thus, we arrive at a three-factor factorization:
$$ A= \begin{bmatrix} 1 & w^* \\ w & K \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ w & I \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & K-w w^* \end{bmatrix} \begin{bmatrix} 1 & w^* \\ 0 & I \end{bmatrix} . $$
Cholesky Factorization¶
In general, we proceed as
\begin{aligned} A & =\left[\begin{array}{cc} a_{11} & w^* \\ w & K \end{array}\right] \\ & =\left[\begin{array}{cc} \alpha & 0 \\ w / \alpha & I \end{array}\right]\left[\begin{array}{cc} 1 & 0 \\ 0 & K-w w^* / a_{11} \end{array}\right]\left[\begin{array}{cc} \alpha & w^* / \alpha \\ 0 & I \end{array}\right]=R_1^* A_1 R_1 \end{aligned}
We repeat this step until the diagonal in the rentral factor is exhausted --- becomes the identity. If the upper-left entry of the submatrix $K-w w^* / a_{11}$ is positive, the same formula can be used to factor it; we then have $A_1=R_2^* A_2 R_2$ and thus $A=$ $R_1^* R_2^* A_2 R_2 R_1$. The process is continued down to the bottom-right corner. At the end, we arrive at
$$ A=\underbrace{R_1^* R_2^* \cdots R_m^*}_{R^*} \underbrace{R_m \cdots R_2 R_1}_R . $$
This equation has the form
$$ A=R^* R, \qquad r_{j j}>0, $$
and $R$ is upper-triangular (so $R^*$ is lower-triangular).
This type of a reduction of a HPD matrix is called Cholesky factorization.
OK, but how do we know whether the upper-left entry of the submatrix $K-w w^* / a_{11}$ is positive?
It must be positive because $K-w w^* / a_{11}$ is positive definite, since it is the $(m-1) \times(m-1)$ lower-right principal submatrix of the positive definite matrix $R_1^{-*} A R_1^{-1}$.
By induction, the process doesn't break down.
Theorem¶
Every hermitian positive definite matrix $A \in \mathbb{C}^{m \times m}$ has a unique Cholesky factorization.
Algorithm 23.1 Cholesky Factorization¶
- $R=A$
- for $k=1$ to $m$
- for $j=k+1$ to $m$
- $R_{j, j: m}=R_{j, j: m}-R_{k, j: m} \bar{R}_{k j} / R_{k k}$
- $R_{k, k: m}=R_{k, k: m} / \sqrt{R_{k k}}$
- for $j=k+1$ to $m$
Operation Count¶
The arithmetic done in Cholesky factorization is dominated by the inner loop.
A single execution of the line
$$R_{j, j: m}=R_{j, j: m}-R_{k, j: m} \bar{R}_{k j} / R_{k k}$$
requires one division, $m-j+1$ multiplications, and $m-j+1$ subtractions, for a total of $\sim 2(m-j)$ flops.
This calculation is repeated once for each $j$ from $k+1$ to $m$, and that loop is repeated for each $k$ from 1 to $m$. So the total is:
$$ \sum_{k=1}^m \sum_{j=k+1}^m 2(m-j) \sim 2 \sum_{k=1}^m \sum_{j=1}^k j \sim \sum_{k=1}^m k^2 \sim \frac{1}{3} m^3 \text { flops } . $$
Cholesky factorization requires only haf as many operations as Gaussian elimination.
Stability¶
All of the subtleties of the stability analysis of Gaussian elimination vanish for Cholesky factorization. This algorithm is always stable!
The reason is that the factors $R$ can never grow large. In the 2-norm, for example, we have $\|R\|=\left\|R^*\right\|=\|A\|^{1 / 2}$ (proof: SVD), and in other $p$-norms with $1 \leq p \leq \infty,\|R\|$ cannot differ from $\|A\|^{1 / 2}$ by more than a factor of $\sqrt{m}$.
Theorem (Backward Stability)¶
Let $A \in \mathbb{C}^{m \times m}$ be hermitian positive definite, and let a Cholesky factorization of $A$ be computed by Algorithm 23.1 on a computer satisfying (13.5) and (13.7). For all sufficiently small $\epsilon_{\text {machine }}$, this process is guaranteed to run to completion (i.e., no zero or negative corner entries $r_{k k}$ will arise), generating a computed factor $\tilde{R}$ that satisfies
$$ \tilde{R}^* \tilde{R}=A+\delta A, \quad \frac{\|\delta A\|}{\|A\|}=O\left(\epsilon_{\text {M }}\right) $$
for some $\delta A \in \mathbb{C}^{m \times m}$.
Solution of $Ax=b$ for Hermitian Positive Definite $A$¶
If $A$ is hermitian positive definite, the standard way to solve a system of equations $A x=b$ is by Cholesky factorization. Algorithm 23.1 reduces the system to
$$R^* R x=b.$$
We then solve two triangular systems in succession: first
$$R^* y=b$$
for the unknown $y$, then
$$ R x=y $$
for the unknown $x$.
Each triangular solution requires just $\sim m^2$ flops, so the total work is again $\sim \frac{1}{3} m^3$ flops.
Theorem¶
The solution of hermitian positive definite systems $A x=b$ via Cholesky factorization (Algorithm 23.1) is backward stable, generating a computed solution $\tilde{x}$ that satisfies
$$ (A+\Delta A) \tilde{x}=b, \quad \frac{\|\Delta A\|}{\|A\|}=O\left(\epsilon_{\text {machine }}\right) $$
for some $\Delta A \in \mathbb{C}^{m \times m}$.
Implementation and Examples¶
using LinearAlgebra
# Start with some symmetric matrix.
A = [
24 18 4 12
18 -33 17 13
4 17 51 9
12 13 9 13
];
norm(A-A')|>display
eigvals(A)|>display
m = size(A)[1]
L1 = Matrix{Float64}(I, m, m);
L1[2:m,1] = -A[2:m,1]/A[1,1];
A1 = L1*A
display(A1)
0.0
4-element Vector{Float64}: -41.966645637229355 4.631407078287072 31.41206807719446 60.92317048174786
4×4 Matrix{Float64}: 24.0 18.0 4.0 12.0 0.0 -46.5 14.0 4.0 0.0 14.0 50.3333 7.0 0.0 4.0 7.0 7.0
This didn't change the first row of $A$. But we can transpose (take the Hermitian) $A$ an the apply the same matrix on the left again, it would effectively zero the row of $A$.
L1*A1'
4×4 Matrix{Float64}: 24.0 0.0 0.0 0.0 0.0 -46.5 14.0 4.0 0.0 14.0 50.3333 7.0 0.0 4.0 7.0 7.0
# We could have simply done
A1 = L1*A*L1'
4×4 Matrix{Float64}: 24.0 0.0 0.0 0.0 0.0 -46.5 14.0 4.0 0.0 14.0 50.3333 7.0 0.0 4.0 7.0 7.0
# Go on:
L2 = Matrix{Float64}(I, m, m);
L2[3:4,2] = -A1[3:4,2]/A1[2,2];
A2 = L2*A1*L2'
4×4 Matrix{Float64}: 24.0 0.0 0.0 0.0 0.0 -46.5 0.0 0.0 0.0 0.0 54.5484 8.2043 0.0 0.0 8.2043 7.34409
L3 = Matrix{Float64}(I, m, m);
L3[4,3] = -A2[4,3]/A2[3,3];
A3 = L3*A2*L3'
4×4 Matrix{Float64}: 24.0 0.0 0.0 0.0 0.0 -46.5 0.0 0.0 0.0 0.0 54.5484 0.0 0.0 0.0 0.0 6.11013
L = inv(L1)*inv(L2)*inv(L3)
L|>display
D = diagm(diag(A3))
D|>display
4×4 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.75 1.0 0.0 0.0 0.166667 -0.301075 1.0 0.0 0.5 -0.0860215 0.150404 1.0
4×4 Matrix{Float64}: 24.0 0.0 0.0 0.0 0.0 -46.5 0.0 0.0 0.0 0.0 54.5484 0.0 0.0 0.0 0.0 6.11013
norm( A - L*D*L' )
1.7763568394002505e-15
We haven't used a positive definite matrix in the example above. And in the end, we did not get our diagonal matrix to be $\mathbb{I}$. Cholesky factorization algorithm also distributes the terms on the diagonals symmetrically to the offdiagonal. See below.
# Work with a positive definite Hermitian Matrix
B = A'*A
4×4 Matrix{Int64}: 1060 62 714 714 62 1871 495 109 714 495 2987 845 714 109 845 563
# Maybe normalize for sanity.
B = B/100
4×4 Matrix{Float64}: 10.6 0.62 7.14 7.14 0.62 18.71 4.95 1.09 7.14 4.95 29.87 8.45 7.14 1.09 8.45 5.63
eigvals(B)
4-element Vector{Float64}: 0.21449931524807814 9.867180208862997 17.611993460407803 37.11632701548113
r = sqrt(B[1,1]);
L1 = Matrix{Float64}(I,m,m)
L1[1,1] = 1/r;
L1[2:4,1] = -B[2:4,1]/r^2
3-element Vector{Float64}: -0.05849056603773585 -0.6735849056603773 -0.6735849056603773
B1 = L1*B*L1'
4×4 Matrix{Float64}: 1.0 2.77556e-17 4.44089e-16 4.44089e-16 0.0 18.6737 4.53238 0.672377 0.0 4.53238 25.0606 3.6406 0.0 0.672377 3.6406 0.820604
r = sqrt(B1[2,2]);
L2 = Matrix{Float64}(I,m,m);
L2[2,2] = 1/r;
L2[3:4,2] = -B1[3:4,2]/r^2;
B2 = L2*B1*L2'
4×4 Matrix{Float64}: 1.0 6.42295e-18 4.37353e-16 4.4309e-16 0.0 1.0 0.0 -2.77556e-17 0.0 0.0 23.9605 3.47741 0.0 0.0 3.47741 0.796394
r = sqrt(B2[3,3]);
L3 = Matrix{Float64}(I,m,m);
L3[3,3] = 1/r;
L3[4:4,3] = -B2[4:4,3]/r^2;
B3 = L3*B2*L3'
4×4 Matrix{Float64}: 1.0 6.42295e-18 8.93477e-17 3.79617e-16 0.0 1.0 0.0 -2.77556e-17 0.0 0.0 1.0 1.11022e-16 0.0 0.0 9.0724e-17 0.291715
L4 = Matrix{Float64}(I,m,m);
L4[4,4] = 1/sqrt(B3[4,4]);
B4 = L4*B3*L4'
4×4 Matrix{Float64}: 1.0 6.42295e-18 8.93477e-17 7.02855e-16 0.0 1.0 0.0 -5.13891e-17 0.0 0.0 1.0 2.05556e-16 0.0 0.0 1.67974e-16 1.0
inv(L1)*inv(L2)*inv(L3)*inv(L4)
4×4 Matrix{Float64}: 3.25576 0.0 0.0 0.0 0.190431 4.32131 0.0 0.0 2.19303 1.04884 4.89495 0.0 2.19303 0.155596 0.710407 0.540107
L1
4×4 Matrix{Float64}: 0.307148 0.0 0.0 0.0 -0.0584906 1.0 0.0 0.0 -0.673585 0.0 1.0 0.0 -0.673585 0.0 0.0 1.0
inv(L1)
4×4 Matrix{Float64}: 3.25576 0.0 0.0 0.0 0.190431 1.0 0.0 0.0 2.19303 0.0 1.0 0.0 2.19303 0.0 0.0 1.0