A Course on Numerical Linear Algebra¶
Instructor: Deniz Bilman
Textbook: Numerical Linear Algebra, by Trefethen and Bau.
Lecture 12: Conditioning and Condition Numbers¶
When we think of a problem, it is useful to think of it as a function $f$ from a normed vector space $X$ into a normed vector space $Y$, $f\colon X \to Y$. This is somewhat an abstraction of the notion of a data to solution map.
$f$ is usually nonlinear (including problems arising in linear algebra). Often times, we are interested in the behavior of a problem at a particular data point $x\in\mathbb{X}$. This is a realization of the problem at hand for the data point $x$, but we can refer to that as the problem in covering the material (following the convention of Trefethen & Bau).
A well-conditioned problem (instance) is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$.
An ill-conditioned problem is one with the property that some small perturbation of $x$ leads to a large change in $f(x)$.
You may think, this idea immediately smells like some sort of a derivative of $f$ (if $f$ is differentiable!), and you would not be wrong.
Absolute Condition Number¶
Let $\delta x$ denote a small perturbation of $x$, and introduce
$$\delta f=f(x+\delta x)-f(x).$$The absolute condition number $\hat{\kappa}=\hat{\kappa}(x)$ of the problem $f$ at $x$ is defined as
$$ \hat{\kappa}(x) = \lim _{\delta \rightarrow 0} \sup _{\|\delta x\| \leq \delta} \frac{\|\delta f\|}{\|\delta x\|} . $$In practice, we may think of the limit of the supremum above as the supremum over all infinitesimal perturbations $\delta x$, so we sometimes use the shorthand notation
$$ \hat{\kappa}(x)=\sup _{\delta x} \frac{\|\delta f\|}{\|\delta x\|} $$with the understanding that $\delta x$ is an infinitesimal quantity.
What if $f$ is differentiable?¶
Let $J(x)$ denote the Jacobian (matrix) of $f$ at $x$, where
$$ J_{ij}(x) = \frac{\partial f_i}{\partial x_j} $$By the definition of the derivative, we have, to first order, $\delta f \approx J(x) \delta x$, with equality in the limit $\|\delta x\| \rightarrow 0$.
Then the absolute condition number becomes
$$ \hat{\kappa}=\|J(x)\|, $$where $\|J(x)\|$ represents the norm of $J(x)$ induced by the norms on $X$ and $Y$.
Relative Condition Number¶
Sometimes we are concerned with relative changes. The perturbations are in the form of percentages.
In this case we are dealing with the relative condition number, defined as:
$$\kappa=\kappa(x)=\lim _{\delta \rightarrow 0} \sup _{\|\delta x\| \leq \delta}\left(\frac{\displaystyle \frac{\|\delta f\|}{\|f(x)\|}}{\displaystyle\frac{\|\delta x\|}{\|x\|}}\right) $$and the shorthand expression
$$ \kappa=\sup _{\delta x} \left(\frac{\displaystyle \frac{\|\delta f\|}{\|f(x)\|}}{\displaystyle\frac{\|\delta x\|}{\|x\|}}\right) $$with the understanding that $\delta x$ and $\delta f$ are infinitesimal quantitites.
Again, if $f$ happens to be differentiable, we have
$$ \kappa = \frac{\|J(x)\|}{\displaystyle\frac{\|f(x)\|}{\|x\|}} $$Both absolute and relative condition numbers have their uses, but the latter are more important in numerical analysis. One reason for this is because the floating point arithmetic used by computers introduces relative errors rather than absolute ones; see the next lecture. Moreover, the relative condition number explains the accuracy in dimensionless terms.
A problem is well-conditioned if $\kappa$ is small, e.g. $1$, $10$, $10^2$.
A problem is ill-conditioned if $\kappa$ is large, e.g. $10^6$, $10^{12}$.
Examples¶
Example: Consider the problem of obtaining the scalar $x / 2$ from $x \in \mathbb{C}$.
The Jacobian of the function $f: x \mapsto x / 2$ is just the derivative $J=f^{\prime}=1 / 2$, so
$$ \kappa=\frac{\|J\|}{\|f(x)\| /\|x\|}=\frac{1 / 2}{(x / 2) / x}=1 . $$A well-conditioned problem.
Example: Consider the problem of obtaining the scalar $f(x)=x_1-x_2$ from the vector $x=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \in \mathbb{C}^2$. For simplicity, we use the $\infty$-norm on the data space $\mathbb{C}^2$. The Jacobian of $f$ at $x$ is
$$ J(x)=\begin{bmatrix} \displaystyle \frac{\partial f}{\partial x_1} &\displaystyle \frac{\partial f}{\partial x_2} \end{bmatrix} =\begin{bmatrix} 1 & -1 \end{bmatrix}. $$Then $\|J\|_{\infty} = 2$ (maximal row absolute-sum). So we find that
$$ \kappa=\frac{\|J\|_{\infty}}{\|f(x)\| /\|x\|}=\frac{2}{\displaystyle \frac{|x_1-x_2|}{\max \left\{\left|x_1\right|,\left|x_2\right|\right\}}} = \frac{2 \max \left\{\left|x_1\right|,\left|x_2\right|\right\}}{\displaystyle |x_1-x_2|} $$This quantity is large when $|x_1-x_2|\approx 0$. So the problem is ill-conditined for $x$ with $x_1\approx x_2$. This makes sense when we think of the cancellation errors that occur when we subtract two quantites that are close.
Classic Example of an Ill-Conditioned Problem: Finding roots of a polynomial, given the coefficients.
We can start with the following polynomial:
$$ p(x) = (x-1)^2 = x^2 -2x + 1, $$which has a double root at $x=1$.
Consider the perturbed polynomial $\tilde{p}(x)=x^2 -2x + 0.9999$, which factors as
$$ \tilde{p}(x) = (x-0.99)(x-1.01). $$So, you see that a small perturbation in the coefficients may lead to a larger change in the roots.
In fact, it is easy to calculate that the roots change in proportion to the square root of the change in the coefficients. So, in this case the Jacobian is infinite (the problem is not differentiable), and $\kappa=\infty$.
Even when we are not in a double-root scenario, the problem is typically ill-conditioned.
Let $a_n$ denote the $n$th coefficient of a polynomial and perturb it by (an infinitesimal quantity) $\delta a_n$. Then the perturbation of the $j$th root $x_j$ is
$$ \delta x_j = -\frac{(\delta a_n) x_j^n}{p'(x_j)} $$where $p'$ denotes the derivative of $p$.
So we can compute the condition number of $x_j$ with respect to perturbations of the single coefficient $a_n$:
$$ \kappa =\frac{\displaystyle \frac{|\delta x_j|}{|x_j|} }{\displaystyle \frac{|\delta a_n|}{a_n}} = \frac{\left|a_n x_j^{n-1}\right|}{\left|p^{\prime}(x_j)\right|} $$This number is often very large.
Here is the standard example:
Consider the Wilkinson polynomial
$$ p(x)=\prod_{i=1}^{20}(x-i)=a_0+a_1 x+\cdots+a_{19} x^{19}+x^{20} $$It is known that (and you are encouraged to investigate this!) the most sensitive root of this polynomial is $x=15$ and it is most sensitive to changes in the coefficient
$$ a_{15} =- 1672280820 \approx - 1.67 \times 10^9. $$The condition number is: $$ \kappa \approx \frac{1.67 \times 10^9 \cdot 15^{14}}{5!14!} \approx 5.1 \times 10^{13} . $$ This is a huge number!
# In Julia we can easily construct a polynomial object just by specifying the roots:
# Here is a simple example.
using Polynomials
r_0 = [1,1,2,3,0.25,5]
m = length(r_0)
p = fromroots(r_0)
p|>display
p(4)|>display
-67.5
# Perturb the coefficients randomly by a small amount.
# Note that we are doing relative perturbations
p_coeffs = coeffs(p)
# Amount I am perturbing (multiplicatively) by:
pert = ones(m+1)+1e-9*randn(m+1)
pert |> display
# Perturbed coefficients
q_coeffs = pert .* p_coeffs
# Construct the polynomial with the perturbed coefficients.
q = Polynomial(q_coeffs)
r_q = roots(q)|>display
7-element Vector{Float64}: 0.9999999996990405 0.9999999998107522 1.00000000162732 0.9999999998728104 1.0000000006462326 1.0000000007934227 0.9999999991438795
6-element Vector{Float64}: 0.25000000096785924 0.9998013464884086 1.0001987568742 1.9999997912522753 3.0000000919042957 5.000000032719857
Let's implement this for the Wilkinson polynomial.
using Polynomials
using Plots, ColorSchemes
r = collect(1.0:20)
m = length(r)
p = fromroots(r)
scatter([],[], xlab="Re", ylab="Im", legend=false,aspect_ratio=1) # Initialize an empty plot
# Now perturbed to coefficients randomly. Construct 200 randomly perturbed polynomials.
for k = 1:100
q = Polynomial(coeffs(p) .* (ones(m+1) .+ 1e-9 * randn(m+1)))
r_q = roots(q)
color = get(ColorSchemes.Purples, k/100)
scatter!(real(r_q), imag(r_q), color=color, markersize=2)
end
scatter!(real.(r), imag.(r), xlab="Re", ylab="Im", legend=false, marker=:circle, markersize=4, markercolor=:gold)
display(current()) # Show the final plot
Example: The problem of computing the eigenvalues of a nonsymmetric matrix is often ill-conditioned.
$$ \operatorname{spec}\left( \begin{bmatrix} 1 & 1000 \\ 0 & 1 \end{bmatrix} \right) = \{1,1\}, $$but $$ \operatorname{spec}\left( \begin{bmatrix} 1 & 1000 \\ 0.001 & 1 \end{bmatrix} \right) = \{0,2\}. $$
On the other hand, if a matrix $A$ is symmetric (or more generally, normal), then the problem is well-conditioned. Moreover, if $\lambda$ is an eigenvalue of $A$ and $\tilde{\lambda}$ is the corresponding eigenvalue of $A+\delta A$, then for $\delta \lambda = \tilde{\lambda}-\lambda$ one can show that (exercise!)
$$ |\delta \lambda| \leq\|\delta A\|_2 $$and the equality is achieved if $\delta A$ is a multiple of $\mathbb{I}$. Therefore, the absolute condition number for the symmetric eigenvalue problem is
$$ \hat{\kappa} = \sup_{\delta A} \frac{\|\delta A\|_2}{|\delta \lambda|} = 1. $$if we measure the perturbations in the $2$-norm.
One can compute that the relative condition number is
$$ \kappa = \frac{\|A\|_2}{|\lambda|}. $$Condition of Matrix-Vector Multiplication¶
Fix $A \in \mathbb{C}^{m \times n}$ and consider the problem of computing $A x$ from input $x$.
So we are interested in the condition number corresponding to perturbations of $x$ --- not $A$.
Let $\| \cdot \|$ denote an arbitrary vector norm on $\mathbb{C}^n$ and $\mathbb{C}^m$, and also denote the induced matrix norm.
We have
$$ \kappa=\sup _{\delta x} \left(\frac{\displaystyle \frac{\|\delta f\|}{\|f(x)\|}}{\displaystyle\frac{\|\delta x\|}{\|x\|}}\right) = \sup _{\delta x} \left(\frac{\displaystyle \frac{\|A(x+\delta x)-Ax\|}{\|Ax\|}}{\displaystyle\frac{\|\delta x\|}{\|x\|}}\right) = \sup _{\delta x} \left(\frac{\displaystyle \frac{\|A(\delta x)\|}{\|Ax\|}}{\displaystyle\frac{\|\delta x\|}{\|x\|}}\right) = \sup _{\delta x} \left(\frac{\displaystyle \frac{\|A(\delta x)\|}{\|\delta x\|}}{\displaystyle\frac{\|A x\|}{\|x\|}}\right). $$Therefore, taking the $\sup$ over all infinitesimal quantities $\delta x$ (recall the limit in the definition), we arrive at
$$ \kappa = \kappa(x) =\|A\| \frac{\|x\|}{\|A x\|}. $$Note that this is a special case of the general formula for differentiable $f$: $$ \kappa=\frac{\|J(x)\|}{\frac{\|f(x)\|}{\|x\|}} $$
The formula we have for the problem $x\mapsto A x$ is an exact formula. Now assume further that $A$ is a square matrix that is invertible.
Then $x = A^{-1}(A x)$ and we obtain
$$ \| x \| \leq \|A^{-1}\| \|Ax\| \implies \frac{\|x\|}{\|A x\|} \leq \|A^{-1}\|. $$Using this, we obtain an upper bound on the relative condition number $\kappa$ that is independent of $x$:
$$ \kappa \leq \|A\|\|A^{-1}\|. $$So we may write the exact formula for the condition number in terms of this bound as
$$ \kappa = \kappa(x) =\alpha \|A\|\|A^{-1}\|,\qquad\text{where}\quad \alpha:= \frac{\frac{\|x\|}{\|Ax\|}}{\|A^{-1}\|}. $$Exercise: Show that one can find $x$ such that $\kappa(x) =1$. If we choose to work with $\| \cdot \|_2$, then $\kappa(x) =1$ occurs whenever $x$ is a multiple of the minimal right singular vector of $A$.
Actually, $A$ need not be a square matrix. If $A\in\mathbb{C}^{m\times n}$, $m\geq n$, is of full rank, then the formula for $\kappa$ holds with $A^{-1}$ replaced with the pseudoinverse $A^+$.
How About the Inverse Problem?¶
The related problem is given $A$, compute $A^{-1}b$ from input $b$ (this is solving $Ax=b$ for $x$). This problem is identical to the one we studied above when we replace $A$ by $A^{-1}$ in the above text (and replace $x$ with $b$).
Our findings can be summarized as the following theorem.
Theorem:¶
Let $A \in \mathbb{C}^{m \times m}$ be nonsingular and consider the equation $A x=b$.
The problem of computing b, given $x$, has condition number
$$ \kappa=\|A\| \frac{\|x\|}{\|b\|} \leq\|A\|\left\|A^{-1}\right\| $$with respect to perturbations of $x$.
The problem of computing $x$, given $b$, has condition number
$$ \kappa=\left\|A^{-1}\right\| \frac{\|b\|}{\|x\|} \leq\|A\|\left\|A^{-1}\right\| $$with respect to perturbations of $b$.
If $\|\cdot\|=\|\cdot\|_2$, then equality holds for the first problem if $x$ is a multiple of a right singular vector of $A$ corresponding to the minimal singular value $\sigma_m$. Equality holds for the second problem if $b$ is a multiple of a left singular vector of $A$ corresponding to the maximal singular value $\sigma_1$.
Condition Number of a Matrix¶
The product $\|A\|\left\|A^{-1}\right\|$ appearing in the upper bounds for the relative condition numbers above is called the matrix condition number. Given invertible matrix $A$, we define $$ \kappa(A) := \|A\|\left\|A^{-1}\right\|. $$
Recall that this number bounds the relative condition number for solving $Ax=b$ for given $b$ (so, with respect to perturbations of $b$).
Hilbert Matrix Strikes Again!¶
Let's compute the condition number of the Hilbert matrix $H$, given by $$ H_{ij} = \frac{1}{i+j},\qquad 1\leq i,j\leq n. $$
Even for $n=6$, the condition number of $H$ is huge.
using LinearAlgebra, Plots
# Sleek way to define a function in Julia
hilbert_matrix(n) = [ 1.0/(i+j) for i=1:n, j=1:n]
H = hilbert_matrix(6)
κ = cond(H)
5.109816297946132e7
# Range of n values
n_values = 1:20
# Compute condition numbers
κ_values = [cond(hilbert_matrix(n)) for n in n_values]
# Plot. Note that y-axis is in the log-scale!
plot(n_values, κ_values, marker=:o, lw=2, label="Condition Number",
xlabel="Matrix Size (n)", ylabel="Condition Number κ(H)",
yscale=:log10, title="Condition Number of Hilbert Matrices",legend=:bottomright)
Now recall the above computation where we found that $$ \kappa(H) \approx 5.109816297946132e7 $$ for the $6\times 6$ Hilbert matrix $H$. The approximate sign is there because this is a number computed on the computer with floating point arithmetic.
Since $\kappa(H)$ is (importantly) the relative condition number, it reports dimensionless information (significant digits). Our finding tells us that in solving $Ax =b$ for $x$, or equivalently, in computing $b\mapsto A^{-1}b$, we may lose up to around 6 digits.
Let's illustrate this by an actual computation in the code-block below.
Before moving onto the computation, note that by submultiplicativity of the matrix norm $$ \|\delta x\|=\left\|A^{-1} \delta b\right\| \implies \|\delta x\| \leq\left\|A^{-1}\right\|\|\delta b\| . $$ Dividing by $\|x\|$, and noting that $x=A^{-1} b$, so $\|x\|=\left\|A^{-1} b\right\|$, we get: $$ \frac{\|\delta x\|}{\|x\|} \leq \frac{\left\|A^{-1}\right\|\|\delta b\|}{\left\|A^{-1} b\right\|} \implies \frac{\|\delta x\|}{\|x\|} \leq\left\|A^{-1}\right\| \frac{\|\delta b\|}{\|b\|} \frac{\|b\|}{\left\|A^{-1} b\right\|} . $$ Since $\|b\|=\|A x\| \leq\|A\|\|x\|$, we have: Thus, $$ \frac{\|b\|}{\left\|A^{-1} b\right\|} \leq\|A\|, $$ which gives: $$ \frac{\|\delta x\|}{\|x\|} \leq\left\|A^{-1}\right\|\|A\| \frac{\|\delta b\|}{\|b\|} . $$
So the bound on the relative error made in computing $x$ from given $A$ and $b$ is
$$ \frac{\|\delta x\|}{\|x\|} \leq \kappa(A) \frac{\|\delta b\|}{\|b\|}. $$using Printf
# Pick the true solution x we seek
n = 6
x = collect((1:n) .+ 0.25)
println("x=", x)
# A = n by n Hilbert matrix
A = hilbert_matrix(n)
# Compute the corresponding data point b for the inversion problem
b = A*x
# Pick a perturbation size and perturb b relatively.
ϵ = 1e-10
b_pert = b.*(1.0.+ϵ*rand(n))
δb = b_pert - b;
bound = cond(A)*norm(δb)/norm(b)
# This is the upper bound we have for the relative error made in computing x
@printf("%.4e\n", bound)
rel_error = norm(A\b_pert -x )/norm(x)
print("relative error is in the computed x: ",rel_error)
x=[1.25, 2.25, 3.25, 4.25, 5.25, 6.25] 2.5050e-03 relative error is in the computed x: 0.0006748569738110332
# Make a bunch of random perturbations and see how much error you are making
# Pick the true solution x we seek
n = 6
x = collect((1:n) .+ 0.25)
println("x=", x)
# A = n by n Hilbert matrix
A = hilbert_matrix(n)
# Compute the corresponding data point b for the inversion problem
b = A*x
# Pick a perturbation size and perturb b relatively.
ϵ = 1e-10
for k = 1:8
b_pert = b.*(1.0.+ϵ*rand(n))
δb = b_pert - b;
bound = cond(A)*norm(δb)/norm(b)
@printf("Relative error in x = %.4e\n", norm(A\b_pert -x )/norm(x))
@printf("The bound on the relative error in x = %.4e\n", bound)
println(" ")
end
x=[1.25, 2.25, 3.25, 4.25, 5.25, 6.25] Relative error in x = 1.3133e-04 The bound on the relative error in x = 2.4139e-03 Relative error in x = 3.5690e-04 The bound on the relative error in x = 3.0536e-03 Relative error in x = 2.2007e-04 The bound on the relative error in x = 3.6409e-03 Relative error in x = 6.1132e-06 The bound on the relative error in x = 2.8572e-03 Relative error in x = 2.7515e-04 The bound on the relative error in x = 2.9217e-03 Relative error in x = 2.8125e-04 The bound on the relative error in x = 2.8331e-03 Relative error in x = 1.2914e-04 The bound on the relative error in x = 3.0197e-03 Relative error in x = 1.7552e-04 The bound on the relative error in x = 1.8293e-03
If $A$ is singular, we write $\kappa(A)=\infty$.
Note that if $\|\cdot\|=\|\cdot\|_2$, then $\|A\|=\sigma_1$ and $\left\|A^{-1}\right\|=1 / \sigma_m$. So with the $2$-norm in use, we have
$$ \kappa(A)=\frac{\sigma_1}{\sigma_m}. $$This formula is generally used for computing $2$-norm condition numbers of matrices.
This is nothing but the eccentricity of the hyperellipse that is the image of the unit sphere of $\mathbb{C}^m$ under the action of $A\in\mathbb{C}^{m\times m}$.
For a rectangular matrix $A \in \mathbb{C}^{m \times n}$ of full rank, $m \geq n$, the condition number is defined in terms of the pseudoinverse: $\kappa(A)=\|A\|\left\|A^{+}\right\|$.
Since $A^{+}$ is motivated by least squares problems, this definition is most useful in the case $\|\cdot\|=\|\cdot\|_2$. With the $2$-norm, we have
$$ \kappa(A)=\frac{\sigma_1}{\sigma_n}. $$Condition of a System of Equations¶
How about perturbing $A$? This is equivalent to perturbing the coefficients in the system of equations $Ax=b$. We keep $b$ fixed. $x$ will react to this perturbation and we will have
$$ (A+\delta A)(x+\delta x)=b. $$Here we of course assume that $A$ is invertible. Expanding, we have
$$ A x + (\delta A) x + A(\delta x) + (\delta A)(\delta x) = b \implies (\delta A) x + A(\delta x) + (\delta A)(\delta x) = 0. $$Dropping the quadratic term $(\delta A)(\delta x)$ for the infinitesimals, we have
$$ (\delta A) x + A(\delta x) = 0 \implies (\delta x) = -A^{-1} (\delta A) x. $$This implies (by submiltiplicativity) that
$$ \|\delta x\| \leq\left\|A^{-1}\right\|\|\delta A\|\|x\|. $$Equivalently, we have for the relative errors: $$ \frac{\displaystyle \frac{\|\delta x\|}{\|x\|}}{\displaystyle \frac{\|\delta A\|}{\|A\|}} \leq \left\|A^{-1}\right\|\|A\|=\kappa(A). $$
When does the equality hold? Does it hold? The equality holds if $\delta A$ is chosen such that
$$ \left\|A^{-1}(\delta A) x\right\|=\left\|A^{-1}\right\|\|\delta A\|\|x\|. $$It is an exercise to show that for any $A$, $b$, and given norm $\| \cdot \|$, such a perturbation $\delta A$ exists by the use of dual norms.
This proves the following result.
Theorem:¶
Let $b$ be fixed and consider the problem of computing $x=$ $A^{-1} b$, where $A$ is square and nonsingular. The condition number of this problem with respect to perturbations in $A$ is
$$ \kappa=\|A\|\left\|A^{-1}\right\|=\kappa(A) . $$