A Course on Numerical Linear Algebra¶

Instructor: Deniz Bilman

Textbook: Numerical Linear Algebra, by Trefethen and Bau.

Lecture 27: Rayleigh Quotient, Inverse Iteration¶

We continue with presenting classical algorithms such as inverse iteration (the standard method for finding an eigenvector if the eigenvalue is known).

Restriction to Real Symmetric Matrices¶

Most algorithms apply to matrices regardless of them being Hermitian or not. But the advantages (such as speed up) due to tricks employed in case of Hermitian matrices are be sizeable. From this perspective, one can treat Hermitian problems and non-Hermitian problems as two entirely different classes of problems.

The Setup¶

In this lecture, we assume that $A$ is an $m\times m$ real symmetric matrix. So

$$ A^{\top} = A. $$

We also restrict to the real vector spaces. $x\in\mathbb{R}^m$ so that $x^* = x^\top$, with $\|x\| = \sqrt{x^\top x}$.

We know that $A$ has

$$ \text{real eigenvalues:}\quad \lambda_1, \ldots, \lambda_m, $$ with $$ \text{orthonormal eigenvectors:}\quad q_1, \ldots, q_m,\qquad \left\|q_j\right\|=1. $$

Rayleigh Quotient¶

Suppose we have the real eigenvalue and real eigenvector pair:

$$ A q = \lambda q. $$

Multiplying by $q^\top$ allows us to express $\lambda$ in terms of its eigenvector:

$$ q^\top A q = \lambda q^\top q \implies \lambda = \frac{q^\top A q}{q^\top q}. $$

This motivates us to define the Rayleigh quotient $r(x)$ associated with $A$ for general $x\in\mathbb{R}^m$:

$$ r(x) := \frac{x^\top A x}{x^\top x}. $$

If $x$ is an eigenvector, then $r(x)=\lambda$ is the corresponding eigenvalue.

One way the Rayleigh quotient ($r(x)$) appears is quite motivating for this chapter.

Consider the question:

Consider a vector $x\in\mathbb{R}^m$. What value of scalar $\alpha$ acts most like an eigenvalue associated with $x$ in the sense of minimizing $\| Ax - \alpha x\|_2$?

This gives us an $m\times 1$ least squares problem

$$ x\alpha \approx A x, $$

for the unknown $\alpha$ with the right-hand side $Ax$.

If one writes the normal equation (see (11.9) in the book), one arrives at

$$ \alpha = r(x). $$

So, $r(x)$ is a natural eigenvalue estimate to consider if $x$ is close to an eigenvector.

Rayleigh Quotient as a Function on $\mathbb{R}^m$.¶

Now we look at $r\colon \mathbb{R}^m \to \mathbb{R}$. How does $r(x)$ behave when $x$ is near an eigenvector? Let's look at the derivatives. Recall that $A^{\top}=A$. We have

$$ \begin{aligned} \frac{\partial r(x)}{\partial x_j} &=\frac{\dfrac{\partial}{\partial x_j}\left(x^T A x\right)}{x^T x}-\frac{\left(x^T A x\right) \dfrac{\partial}{\partial x_j}\left(x^T x\right)}{\left(x^T x\right)^2}\\ &= \frac{2(A x)_j}{x^T x}-\frac{\left(x^T A x\right) 2 x_j}{\left(x^T x\right)^2}=\frac{2}{x^T x}(A x-r(x) x)_j \end{aligned} $$ for each $j=1,2,\ldots,m$.

So we find that the gradient of $r(x)$ with respect to $x$ is

$$ \nabla r(x)=\frac{2}{x^T x}(A x-r(x) x) . $$

This shows that $\nabla r(x) = 0$ if and only if $x\neq 0$ is an eigenvector of $A$. Therefore, eigenvectors are precisely the stationary points of the Rayleigh quotient $r(x)$.

Expanding locally near an eigenvector $q_J$, we find

$$ r(x) - r(q_J) = O(\| x- q_J\|^2),\qquad x\to q_J. $$

This shows that the Rayleigh quotient is a quadratically accurate estimate of an eigenvalue.

We continue the lecture with three different iterations.

Power Iteration¶

We have mentioned the power iteration in Lecture 25 and said that it was not a good idea. Yet, we cover it here and give some details.

Let $v^{(0)}$ be a vector with $\left\|v^{(0)}\right\|=1$. The power iteration is expected to generate a sequence $v^{(i)}$ that converges to an eigenvector corresponding to the largest eigenvalue of $A$.

Algorithm 27.1: Power Iteration¶

Pick $v^{(0)}$ to be some vector with $\left\|v^{(0)}\right\|=1$.
for $k=1,2,\ldots$ (until termination, left out in the algorithm)
- Apply $A$ to define: $w=A v^{(k-1)}$
- Normalize to define: $v^{(k)}=w/\|w\|$
- Compute the Rayleigh quotient $\lambda^{(k)}=(v^{(k)})^{\top} A v^{(k)}$

Here is why this may generate a sequence that may converge to the largest eigenvalue (in absolute value).

Write $v^{(0)}$ as a linear combination of the orthonormal eigenvectors $q_i$ :

$$ v^{(0)}=a_1 q_1+a_2 q_2+\cdots+a_m q_m $$

Since $v^{(k)}$ is a multiple of $A^k v^{(0)}$, we have for some constants $c_k$:

$$ \begin{aligned} v^{(k)} & =c_k A^k v^{(0)} \\ & =c_k\left(a_1 \lambda_1^k q_1+a_2 \lambda_2^k q_2+\cdots+a_m \lambda_m^k q_m\right) \\ & =c_k \lambda_1^k\left(a_1 q_1+a_2\left(\lambda_2 / \lambda_1\right)^k q_2+\cdots+a_m\left(\lambda_m / \lambda_1\right)^k q_m\right) \end{aligned} $$

Theorem¶

Suppose $\left|\lambda_1\right|>\left|\lambda_2\right| \geq \cdots \geq\left|\lambda_m\right| \geq 0$ and $q_1^T v^{(0)} \neq 0$. Then the iterates of Algorithm 27.1 satisfy

$$ \left\|v^{(k)}-\left( \pm q_1\right)\right\|=O\left(\left|\frac{\lambda_2}{\lambda_1}\right|^k\right), \quad\left|\lambda^{(k)}-\lambda_1\right|=O\left(\left|\frac{\lambda_2}{\lambda_1}\right|^{2 k}\right) $$

as $k \rightarrow \infty$. The $\pm$ sign means that at each step $k$, one or the other choice of sign is to be taken, and then the indicated bound holds.

One can avoid the $\pm$ issue involved in convergence to $ \pm q_1$ by just thinking of convergence of the subspace $\langle v^{(k)}\rangle$ to the subspace $\langle q_1 \rangle$, but we do not cover this approach.

As you can see, power iteration is of limited use.

It can only find the eigenvector corresponding to the largest eigenvalue.
The convergence is linear: error is reduced by a factor of $|\lambda_2|/|\lambda_1|$ at each iteration.
The error reduction is practical only if the largest eigenvalue $\lambda_1$ is significantly larger than the others. If the two largest eigenvalues are close in magnitude, the convergence will be very slow.

In [6]:

using LinearAlgebra

A = [5 1 1 1; 1 4 1 1; 1 1 3 1; 1 1 1 6]

4×4 Matrix{Int64}:
 5  1  1  1
 1  4  1  1
 1  1  3  1
 1  1  1  6

In [13]:

# Returns a struct
F = eigen(A)
Λ = F.values;
X = F.vectors;
display(Λ)
display(X)

4-element Vector{Float64}:
 2.2960896453121187
 3.392275290272984
 4.5077487053636425
 7.803886359051246

4×4 Matrix{Float64}:
 -0.157381    0.517536  -0.671404  -0.506561
 -0.380963   -0.801782  -0.226102  -0.401113
  0.905684   -0.225903  -0.135941  -0.332002
 -0.0991762   0.19563    0.692542  -0.687225

In [ ]:

# Eigenvector for the largest eigenvalue
x = X[:,4]
x /= norm(x);
x |> display

4-element Vector{Float64}:
 -0.5065613134846444
 -0.40111308352595576
 -0.33200196406021837
 -0.6872253093165073

In [18]:

# Implement 20 iterates of power iteration
v = rand(4)
error = Float64[]

for k in 1:20
    v /= norm(v)
    push!(error, min(norm(v - x), norm(v + x)))
    v = A * v
end

In [26]:

using Plots
using LaTeXStrings

# y-axis in log-scale to show the linearity
scatter(
    1:20, error, 
    yscale = :log10,
    xlabel = L"k",
    ylabel = L"\min(\|v - x\|, \|v + x\|)",
    title = "Convergence Error",
    legend = false
)

In [27]:

# Eigenvalue estimate

λ_estimate = (v' * A * v)/(v' * v);

abs(λ_estimate -Λ[4])

1.503028812521734e-10

Inverse Iteration¶

Suppose that $\mu\in\mathbb{R}$ is not an eigenvalue of $A$. The quantity

$$ (A-\mu \mathbb{I})^{-1} $$

is called the resolvent of $A$. The set of values $\mu$ for which this matrix exists (inverse can be taken) is called the resolvent set of $A$ and it coincides precisely with the complement of the spectrum of $A$.

Importantly, the resolvent $(A - \mu \mathbb{I})^{-1}$ has the same eigenvectors as $A$, but with associated eigenvalues given by $\left\{\left(\lambda_j-\mu\right)^{-1}\right\}$, where $\{\lambda_j\}$ are the eigenvalues of $A$.

Now suppose that $\mu$ is close to an eigenvalue $\lambda_J$ of $A$. Then $(\lambda_J - \mu)^{-1}$ may be much larger than $(\lambda_j - \mu)^{-1}$ for all $j\neq J$. This movitates the following: If we apply power iteration to the resolvent $(A-\mu \mathbb{I})^{-1}$ instead of the matrix $A$ itself, the process will converge rapidly to $q_J$. This procedure is called inverse iteration.

Algorithm 27.2: Inverse Iteration¶

Pick $v^{(0)}$ to be some vector with $\left\|v^{(0)}\right\|=1$.
for $k=1,2,\ldots$ (until termination, left out in the algorithm)
- Solve (A-\mu \mathbb{I})w = v^{(k-1)} for $w$ (i.e. apply $(A-\mu \mathbb{I})^{-1}$)
- Normalize to define: $v^{(k)}=w/\|w\|$
- Compute the Rayleigh quotient $\lambda^{(k)}=(v^{(k)})^{\top} A v^{(k)}$

What are some possible issues?

What if $\mu$ is an eigenvalue of $A$? Then $A-\mu \mathbb{I}$ is singular to begin with. If $\mu$ is nearly an eigenvalue, then $A-\mu \mathbb{I}$ can be so ill-conditioned and in that case can we expect to have an accurate solution $w$ in step 1? It turns out that these situations do not cause any trouble. To see this, you should do Exercise 27.5.

Theorem¶

Suppose $\lambda_J$ is the closest eigenvalue to $\mu$ and $\lambda_K$ is the second closest, that is, $\left|\mu-\lambda_J\right|<\left|\mu-\lambda_K\right| \leq\left|\mu-\lambda_j\right|$ for each $j \neq J$. Furthermore, suppose $q_J^T v^{(0)} \neq 0$. Then the iterates of Algorithm 27.2 satisfy

$$ \left\|v^{(k)}-\left( \pm q_J\right)\right\|=O\left(\left|\frac{\mu-\lambda_J}{\mu-\lambda_K}\right|^k\right), \quad\left|\lambda^{(k)}-\lambda_J\right|=O\left(\left|\frac{\mu-\lambda_J}{\mu-\lambda_K}\right|^{2 k}\right) $$

as $k \rightarrow \infty$, where the $\pm$ sign has the same meaning as in the previous theorem (Theorem 27.1 in the book).

Like power iteration, inverse iteration also exhibits only linear convergence, reducing the error by a constant factor at each step.

But unlike power iteration, the rate of convergence can be controlled (through the quality of the choice of $\mu$). If $\mu$ is much closer to one eigenvalue of $A$ than to the others, then the largest eigenvalue of $(A-\mu I)^{-1}$ will be much larger than the rest.

Inverse iteration is one of the most valuable tools of numerical linear algebra, for it is the standard method of calculating one or more eigenvectors of a matrix if the eigenvalues are already known.

In [33]:

using LinearAlgebra
using Plots

# Setup
μ_values = [6.0, 6.5, 7.5]
n_iter = 20

# Preallocate storage for errors
all_errors = Dict{Float64, Vector{Float64}}()

# Inverse iteration for each μ
for μ in μ_values
    LU = lu(A - μ * I)
    v = rand(4)
    errors = zeros(n_iter)

    for k in 1:n_iter
        v /= norm(v)
        errors[k] = min(norm(v - x), norm(v + x))
        v = LU \ v
    end

    all_errors[μ] = errors
end

In [34]:

plot(yscale = :log10, xlabel = L"k", ylabel = L"\min(\|v - x\|, \|v + x\|)",
     title = "Inverse Iteration", legend = :topright)

for μ in μ_values
    plot!(1:n_iter, all_errors[μ], marker = :circle, label = L"\mu = %$μ")
end

display(current())

Rayleigh Quotient Iteration¶

Rayleigh quotient gives us an eigenvalue estimate from an eigenvector estimate.

Inverse iteration gives us an eigenvector estimate from an eigenvalue estimate.

Why not combine these two?

Algorithm 27.3: Rayleigh Quotient Iteration¶

Pick $v^{(0)}$ to be some vector with $\left\|v^{(0)}\right\|=1$.
Compute the corresponding Rayleigh quotient: $\lambda^{(0)}=\left(v^{(0)}\right)^T A v^{(0)}$.
for $k=1,2,\ldots$ (until termination, left out in the algorithm)
- Solve (A-\lambda^{(k-1)} \mathbb{I})w = v^{(k-1)} for $w$ (i.e. apply $(A-\lambda^{(k-1)} \mathbb{I})^{-1}$)
- Normalize to define: $v^{(k)}=w/\|w\|$
- Compute the Rayleigh quotient $\lambda^{(k)}=(v^{(k)})^{\top} A v^{(k)}$

We quote Trefethen & Bau:

The convergence of this algorithm is spectacular: each iteration triples the number of digits of accuracy.

Theorem:¶

Rayleigh quotient iteration converges to an eigenvalue/eigenvector pair for all except a set of measure zero of starting vectors $v^{(0)}$. When it converges, the convergence is ultimately cubic in the sense that if $\lambda_J$ is an eigenvalue of $A$ and $v^{(0)}$ is sufficiently close to the eigenvector $q_J$, then

$$ \left\|v^{(k+1)}-\left( \pm q_J\right)\right\|=O\left(\left\|v^{(k)}-\left( \pm q_J\right)\right\|^3\right) $$

and

$$ \left|\lambda^{(k+1)}-\lambda_J\right|=O\left(\left|\lambda^{(k)}-\lambda_J\right|^3\right) $$

as $k \rightarrow \infty$. The $\pm$ signs are not necessarily the same on the two sides of (27.6).

The idea here can be thought of as a feedback loop.

A better shift $\mu$ results in a better eigenvector estimate. A better eigenvector estimate results in a better eigenvalue estimate (a better shift). That results in a better eigenvector estimate, etc. That's qualitatively the reason behind the cubic convergence.

In [36]:

# Better to have more precision to observe the convergence --- it is very rapid!

A_big = BigFloat.(A)
x_big = BigFloat.(x)

4-element Vector{BigFloat}:
 -0.50656131348464439323464603148750029504299163818359375
 -0.4011130835259557603222901889239437878131866455078125
 -0.332001964060218368590682302965433336794376373291015625
 -0.68722530931650727126225319807417690753936767578125

In [44]:

μ = BigFloat[]
# Julia has a lazy identity matrix. A - z I: I here matches the type and size of A.
v = BigFloat.(rand(4))
μ = BigFloat[]

# Iterative Rayleigh quotient method
for k in 1:5
    # Normalize the vector v
    v /= norm(v)
    
    # Compute and store the Rayleigh quotient
    push!(μ, dot(v, A * v))
    
    # Update v by solving (A - μ[end] * I) * v = v
    v = (A - μ[end] * I) \ v
end

# Print error in eigenvalue estimates (comparing each estimate with the final one)
println("Error in eigenvalue estimates:")
for i in 1:4
    println(μ[i] - μ[end])
end

Error in eigenvalue estimates:
-0.6498497833709197707063776703870181420934137634455395312399897774409225151786669
-0.01919389327267850588220529337077868023113645526837679885418044878131471943337762
-3.624055079513136659602342161111081609151613830145280289983461689412442833915233e-07
-2.736053156334650557409033097180810357851558195634254199376829457909258680403401e-21

Note the exponent: 21 digits! ("Spectacular")