A Course on Numerical Linear Algebra¶

Instructor: Deniz Bilman

Textbook: Numerical Linear Algebra, by Trefethen and Bau.

Lecture 10: Householder Triangularization¶

This is another approach to compute QR factorization of a matrix.

Let's remember the reduced QR factorization obtained via Gram-Schmidt iteration. The Gram-Schmidt iteration applies a succession of elementary triangular matrices $R_k$ on the right of $A$ so that the resulting matrix $Q$ has orthonormal columns:

$$ A \underbrace{R_1 R_2 \cdots R_n}_{\hat{R}^{-1}}=\hat{Q} $$

Fact: The inverse of an invertible upper-triangular matrix is upper-triangular.

Fact: The product of two upper-triangular matrices is upper-triangular.

So, the product $\hat{R} = R_n^{-1} \cdots R_2^{-1} R_1^{-1}$ is also upper-triangular and we have $A = \hat{Q}\hat{R}$ is a reduced QR factorization of $A$.

Householder¶

Householder algorithm constructs $Q$ as a product of unitary matrices, instead of constructing $R$. It applies a succession of elementary unitary matices $Q_k$ on the left of $A$ so that the resulting matrix is upper-triangular:

$$ \underbrace{Q_n \cdots Q_2 Q_1}_{Q^*} A=R $$

The product $Q=Q_1^* Q_2^* \cdots Q_n^*$ is unitary too, and therefore $A=Q R$ is a full QR factorization of $A$.

So the perspective is the following.

Gram-Schmidt: triangular orthogonalization,

Householder: orthogonal triangularization.

Householder Reflectors¶

How do we construct the unitary matrices $Q_k$ to introduce zeros in the lower-triangular part of the matrix to arrive at an upper-triangular matrix?

$Q_1$ is a unitary matrix that rotates the first column of $A$ to lie on one coordinate axis.

$Q_2$ leaves the first row invariant (acts as identity), and performs a similar rotation on the vector consisting of the remaining entries of the second column of $Q_1 A$.

$Q_3$ leaves the first two rows invariant (acts as identity), and performs a similar rotation on the vector consisting of the remaining entries of the third column of $Q_2 Q_1 A$.

And so on...

Householder triangularization from Trefethen & Bau

The unitary matrices $Q_k$ are constructed using Householder reflectors.

Each $Q_k$ is an $m\times m$ matrix that is of the form

$$ Q_k = \begin{bmatrix} \mathbb{I}_{(k-1)\times (k-1)} & \mathbf{0} \\ \mathbf{0} & F\end{bmatrix}, $$

where $F$ is an $(m-k+1)\times (m-k+1)$ matrix that introduces zeros into the $k$th column as described above. The Householder algorithm chooses $F$ to be a particular matrix called a Householder reflector.

F acts on $x\in \mathbb{C}^{m-k+1}$ as

$$ F x = \begin{bmatrix} \|x\|_2 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} = \|x\|_2 e_1 $$

with $e_1 \in \mathbb{C}^{m-k+1}$. Note that $F^* F = \mathbb{I}$.

Let's understand the role of $F$ better. Given a vector $x$,let

$$ v = \|x\|_2 e1 - x. $$

$F$ reflects the sapce $\mathbb{C}^{m-k+1}$ across the hyperplane $H$ orthogonal to $v = \|x\|_2 e1 - x$.

When the reflector is applied, every point on one side of the hyperplane $H$ is mapped to its mirror image on the other side. In particular, $x$ is mapped to $\|x\| e_1$.

Householder Reflector Trefethen & Bau

We know that for any vector $y$

$$ P y=\left(\mathbb{I}-\frac{v v^*}{v^* v}\right) y=y-\left(\frac{v^* y}{v^* v}\right)v $$

is the orthogonal projection of $y$ onto the space $H$. But to reflect $y$ across $H$, we must not stop at the point obtained by subtracting $\left(\frac{v^* y}{v^* v}\right)v$, we must subtract one more copy of that vector. Therefore, the reflection $Fy$ should be given by

$$ F y=\left(\mathbb{I}-2 \frac{v v^*}{v^* v}\right) y=y-2 \left(\frac{v^* y}{v^* v}\right)v $$

We have already seen that

$$ F = \mathbb{I} - 2 \frac{v v^*}{v^* v} $$

is a unitary matrix (hence of full rank). Note that the projector $P$ involved in the process is not full rank and it differs from $F$ just by a factor of $2$.

The Better of Two Reflectors¶

Well, we have been a bit ignorant so far. As can be seen in the Figure above, even in the real case, there exists two reflectors. The vector $x$ can be reflected to $\pm \|x\|_2 e_1$, i.e. reflected across the hyperplane $H^+$ that is the set of vectors orthogonal to $\|x\|_2 e_1 -x$ or the hyperplane $H^-$ that is the set of vectors orthogonal to $-\|x\|_2 e_1 -x$. In the complex case, there is an infinitude of such choices! The vector $x$ can be reflected to $z \|x\|_2 e_1$, where $z$ is any scalar with $|z|=1$ --- a circle of possible reflections.

For numerical stability, it is desirable to reflect $x$ to the vector $z\|x\|e_1$ that is not too close to $x$ itself, so we may choose $z=-\operatorname{sign}(x_1)$, where $x_1$ denotes the first component of $x$. So the reflection vector becomes $v=-\operatorname{sign}(x_1)\|x\|_2 e_1 -x$. Clearing the factors of $-1$, this reads

$$ v=\operatorname{sign}(x_1)\|x\|_2 e_1 + x $$

We also impose the convention that $\operatorname{sign}(x_1) = 1$ if $x_1=0$.

Why is doing this is better? In what sense it is better?¶

Again look at the figure (in the real case):

Householder Reflector Trefethen & Bau

See, the angle between $H^+$ and the $e_1$ axis is very small compared to the angle between $H^-$ and the $e_1$ axis. The vector $v=\|x\|_2 e_1 -x$ relevant for $H^+$ is therefore smaller than $x$ or $\|x\|_2 e_1$ and the calculation of $v$ will involve subtraction of nearby quantities and will suffer from cancellation errors. If we pick the other sign, we make sure that $\|v\|_2$ is never smaller than $\|x\|_2$.

An Implementation¶

In [4]:

using LinearAlgebra

function householder_qr(A)
    m, n = size(A)
    R = copy(A)
    Q = Matrix{Float64}(I, m, m)
    
    for k in 1:min(m-1, n)
        x = R[k:m, k]
        e1 = zeros(length(x))
        e1[1] = norm(x)
        v = x + sign(x[1]) * e1
        v = v / norm(v)
        
        Hk = Matrix{Float64}(I, m, m)
        Hk[k:m, k:m] -= 2 * (v * v')
        
        R = Hk * R
        Q = Q * Hk'
    end
    
    return Q[:, 1:n], R[1:n, :]
end

# function householder_qr(A)
#     m,n = size(A)
#     Qstar = diagm(ones(m))
#     R = float(copy(A))
#     for k in 1:n
#         z = R[k:m,k]
#         w = [ -sign(z[1])*norm(z) - z[1]; -z[2:end] ]
#         nrmw = norm(w)
#         if nrmw < eps() continue; end    # skip this iteration
#         v = w / nrmw;
#         # Apply the reflection to each relevant column of A and Q
#         for j in 1:n
#             R[k:m,j] -= v*( 2*(v'*R[k:m,j]) )
#         end
#         for j in 1:m
#             Qstar[k:m,j] -= v*( 2*(v'*Qt[k:m,j]) )
#         end
#     end
#     return Qstar',triu(R)
# end

householder_qr (generic function with 1 method)

In [9]:

A = rand(float(1:9),8,5)
m,n = size(A)
A|>display

Q, R = householder_qr(A)

Q|>display

R|>display

8×5 Matrix{Float64}:
 4.0  6.0  5.0  2.0  5.0
 5.0  8.0  1.0  1.0  5.0
 6.0  5.0  2.0  3.0  2.0
 2.0  6.0  7.0  4.0  4.0
 7.0  1.0  6.0  6.0  5.0
 3.0  5.0  8.0  8.0  4.0
 7.0  8.0  3.0  4.0  5.0
 8.0  6.0  7.0  8.0  8.0

8×5 Matrix{Float64}:
 -0.251976  -0.270475    0.131998  -0.652293   -0.0517987
 -0.31497   -0.396449   -0.439054  -0.0386867  -0.348074
 -0.377964   0.0611347  -0.273822   0.0538295   0.73109
 -0.125988  -0.485372    0.461738  -0.267347    0.155994
 -0.440959   0.63543     0.262003  -0.295208    0.0364161
 -0.188982  -0.261212    0.555793   0.554219    0.151175
 -0.440959  -0.181552   -0.322123   0.188621    0.111356
 -0.503953   0.159321    0.142525   0.262222   -0.529862

5×5 Matrix{Float64}:
 -15.8745       -14.6146       -12.2209       -12.4098       -13.2918
   2.57988e-16   -8.56812       -2.73068       -0.424238      -2.65473
  -2.02691e-17    1.74121e-17    8.95512        6.72049        2.82679
   6.83208e-16    4.03504e-18    2.84441e-16    3.26359       -0.634911
   8.08768e-17   -2.20948e-16   -2.88384e-17    3.12092e-16   -2.80854

In [10]:

norm(A - Q*R)

1.608751089108294e-14