A Course on Numerical Linear Algebra¶

Instructor: Deniz Bilman

Textbook: Numerical Linear Algebra, by Trefethen and Bau.

Lecture 8: Gram-Schmidt Orthogonalization¶

We have already mentioned that the classical Gram-Schmidt iteration is numerically unstable.

At each step, the classicial Gram-Schmidt iteration computes only one orthogonal projection:

Let $P_j$ denote the projection $$ P_j = \mathbb{I} - \hat{Q}_{j-1} \hat{Q}_{j-1}^*,\qquad \hat{Q}_j := \begin{bmatrix} q_1 & q_2 & \cdots & q_{j-1} \end{bmatrix}, $$ for orthonormal columns $q_1, q_2,\ldots$. At each step, we compute $v_j := P_j a_j$ for $a_j \in \mathbb{C}^m$, which is a rank $m-(j-1)$ projector: It projects $a_j$ onto the orthogonal complement of the span of all the previous $q_j$'s. Here $P_1=\mathbb{I}$.

In [1]:
using LinearAlgebra
m = 8
j = 4
# Quick way to get a random Qj with j orthonormal columns.
Qj = Matrix(qr(rand(m, j)).Q) 
Qj |> display

# Form the projector onto the orthogonal complement of the columns of Qj
Pj = Matrix(I, m, m) - Qj * Qj'
Pj |> display
# Note that the singular values of Pj are all 1s (or 0s).
σ = svd(Pj).S
8×4 Matrix{Float64}:
 -0.213047   -0.202534   -0.245694   -0.447233
 -0.151908   -0.519449   -0.211781    0.146222
 -0.0896999  -0.579809   -0.285335    0.043158
 -0.429645   -0.0615816   0.0251773  -0.436842
 -0.676201    0.285642   -0.340633    0.173449
 -0.34281    -0.157327    0.771063   -0.260595
 -0.398918    0.230798    0.11058     0.484822
 -0.0706536  -0.435399    0.299895    0.502854
8×8 Matrix{Float64}:
  0.653208   -0.124208   -0.187345     …  -0.032       0.205753    0.19534
 -0.124208    0.640865   -0.381546         0.0676021   0.0118156  -0.246917
 -0.187345   -0.381546    0.572497         0.109289    0.108664   -0.194917
 -0.293191   -0.0280471  -0.0482074       -0.290227    0.0518259   0.154948
 -0.0923295  -0.051846    0.000282129      0.12098    -0.382099    0.0915265
 -0.032       0.0676021   0.109289     …   0.195282   -0.0593648  -0.192917
  0.205753    0.0118156   0.108664        -0.0593648   0.540316   -0.204653
  0.19534    -0.246917   -0.194917        -0.192917   -0.204653    0.462637
8-element Vector{Float64}:
 1.0000000000000002
 1.0000000000000002
 1.0
 1.0
 4.505279385120622e-16
 3.9003634431939383e-16
 3.487356037184413e-16
 1.1660350749431453e-16

Modified Gram-Schmidt Iteration¶

We can express the projection $P_j$ as the product

$$ P_j = P_{\perp q_{j-1}} \cdots P_{\perp q_2} P_{\perp q_1}. $$
In [2]:
# Initialize the product to be the identity matrix
Xj = Matrix(I,m,m)
# Take the products to form the Pj as given above.
for k = 1:j
    Xj = Xj*(Matrix(I,m,m))-Qj[:,k]*Qj[:,k]'
end
norm(Xj-Pj)
2.1698319048434277e-16

This works simply because $q_k$'s are orthonormal --- just check the cross-terms you get and see for yourself.

Modified Gram-Schmidt Iteration (MGS) in principle consists of the same computations as the classical Gram-Schmidt (CGS). But, it performs them in a slightly different order.

In CGS we orthogonalize a column $a_j$ against column $q_k$ ($k<j$) by projecting the original column $a_j$ on the $q_k$ and subtracting:

\begin{align*} v_j &:= a_j - \sum_{k=1}^{j-1} (q_k^* a_j)q_k\\ q_j &:= \frac{v_j}{\|v_j\|_2}. \end{align*}

In MGS we subtract the components along each $q_k$ immediately from the rest of the columns $a_i$ to the right of column $k$ (i>k) --- the remaining columns given as soon as $q_k$ is computed.

This means that the orthogonalization of column $a_j$ against $q_k$ is carried out by subtracting the component of $q_k$ from the vector $v_j^{k}$, where

\begin{align*} v_{j}^{(1)}&:=a_j\\ v_{j}^{(2)}&:=P_{\perp q_1}v_{j}^{(1)} = v_{j}^{(1)} - \left(q_1^* v_{j}^{(1)}\right)q_1 \\ v_{j}^{(3)}&:=P_{\perp q_2}v_{j}^{(2)} = v_{j}^{(2)} - \left(q_2^* v_{j}^{(2)}\right)q_2 \\ &\cdots \\ v_j := v_{j}^{(j)}&:=P_{\perp q_{j-1}}v_{j}^{(j-1)} = v_{j}^{(j-1)} - \left(q_{j-1}^* v_{j}^{(j-1)}\right)q_{j-1}. \end{align*}

Why is the roundoff error smaller in MGS?

Note that $q_k$ has an error that lies in the subspace $\operatorname{span}(\{q_1,q_2,\ldots, q_{k-1}\})$. $a_j$ may have large components in $\operatorname{span}(\{q_1,q_2,\ldots, q_{k-1}\})$. However, $v_j^{k}$ has only error-size components in $\operatorname{span}(\{q_1,q_2,\ldots, q_{k-1}\})$ by design in MGS. Therefore, the inner product $q_k^* a_j$ result in larger errors compared to the inner product $q_k^* v_j^{(k)}$.

Let's see this in action.

In [3]:
# Implementing Classical Gram-Schmidt
function classical_gram_schmidt(A)
    m,n = size(A)
    Q = zeros(m,n)
    R = zeros(n,n)
    R[1,1] = norm(A[:,1]);
    Q[:,1] = A[:,1]/R[1,1];

    for j = 2:n
        R[1:j-1,j] = Q[:,1:j-1]'*A[:,j];
        v = A[:,j] - Q[:,1:j-1]*R[1:j-1,j];
        R[j,j] = norm(v);
        Q[:,j] = v/R[j,j];
    end
    return Q
end

A = round.(10*rand(8,8))

Qcgs = classical_gram_schmidt(A)

# Just check orthogonality
m,m = size(Qcgs)
Matrix(I, m, m)-Qcgs*Qcgs'
8×8 Matrix{Float64}:
  1.19904e-14  -4.2805e-15   -3.28134e-15  …  -3.89261e-15  -3.64241e-15
 -4.2805e-15   -6.88338e-15  -4.74382e-15      6.76964e-16  -3.67212e-15
 -3.28134e-15  -4.74382e-15  -3.10862e-15      1.0424e-15   -8.6227e-16
 -2.97893e-15  -1.4617e-16   -8.12949e-16      1.09602e-15   1.02259e-15
  7.31927e-15   6.41607e-15   5.29492e-15     -2.09818e-15   2.12093e-15
  7.62787e-15  -1.53469e-15  -1.87155e-15  …  -2.74933e-15  -1.49055e-15
 -3.89261e-15   6.76964e-16   1.0424e-15       1.33227e-15   1.07084e-15
 -3.64241e-15  -3.67212e-15  -8.6227e-16       1.07084e-15  -4.44089e-16
In [5]:
# Modified Gram-Schmidt (MGS)
function modified_gram_schmidt(A)
    m, n = size(A)
    Q = copy(A)
    for k in 1:n
        Q[:, k] /= norm(Q[:, k])  # Normalize
        for j in k+1:n
            Q[:, j] -= (Q[:, k]' * Q[:, j]) * Q[:, k]  # Orthogonalization step
        end
    end
    return Q
end

A = round.(10*rand(8,8))

Qmgs = modified_gram_schmidt(A)

# Just check orthogonality
m,m = size(Qmgs)
Matrix(I, m, m)-Qmgs*Qmgs'
8×8 Matrix{Float64}:
  0.0           1.70104e-16   5.26442e-16  …  -3.24987e-16   5.1595e-16
  1.70104e-16   7.77156e-16  -3.24783e-16      4.42414e-18   1.00732e-15
  5.26442e-16  -3.24783e-16   1.11022e-16      2.68324e-16  -5.42544e-16
  5.24078e-16   2.70703e-16  -4.0924e-18       2.76845e-16   4.70332e-16
  4.3916e-16    1.46378e-16  -3.08433e-16      2.29891e-16  -3.24684e-16
  5.83465e-16  -1.15728e-15  -2.75138e-16  …   4.24259e-16  -1.80613e-15
 -3.24987e-16   4.42414e-18   2.68324e-16     -2.22045e-16   5.43335e-16
  5.1595e-16    1.00732e-15  -5.42544e-16      5.43335e-16   8.88178e-16

We want $\mathbb{I} - Q Q^* =0$. We will check this using Hilbert matrices (homework: read about them.)

In [6]:
# Let's compare for matrices of growing sizes.

function hilbert_matrix_reg(n)
    H = [1.0/(i+j-1) for i=1:n, j=1:n] + 0.001 * I
    return H
end

function eye(n)
    Matrix{Float64}(I, n, n)
end
hilbert_matrix_reg(10)

err_cgs = []
err_mgs = []
p_end = 10
for p=1:p_end
    H = hilbert_matrix_reg(2^p)
    id = eye(2^p)
    Qcgs = classical_gram_schmidt(H)
    Qmgs = modified_gram_schmidt(H)
    append!(err_cgs, norm(id - Qcgs*Qcgs', Inf))
    append!(err_mgs, norm(id - Qmgs*Qmgs', Inf))
end
In [7]:
using Plots
mech = eps(Float64)
plist = [2^p for p=1:p_end]

plot(plist, err_cgs, label="CGS Error", lw=2, marker=:o, ylabel="Error", xlabel="n", yscale=:log10)
plot!(plist, err_mgs, label="MGS Error", lw=2, marker=:s, linestyle=:dash, yscale=:log10)
┌ Warning: attempting to remove probably stale pidfile
│   path = /Users/denizbilman/.julia/compiled/v1.10/Plots/ld3vC_iT3rh.ji.pidfile
└ @ FileWatching.Pidfile /Users/denizbilman/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/FileWatching/src/pidfile.jl:244
In [ ]: