A Course on Numerical Linear Algebra¶
Instructor: Deniz Bilman
Textbook: Numerical Linear Algebra, by Trefethen and Bau.
Lecture 4 (Continued): Straightforward Applications of Singular Value Decomposition (SVD)¶
using LinearAlgebra
using Plots
# using Pkg
# Pkg.add(["Images", "ImageIO"])
using Images, ImageIO
# Load an image from a file
img = load("figs/hubbard-bn.jpg");
# Let's display the image
display(img)
We should convert this image object into a matrix. We will also convert it to grayscale so that each pixel in the image corresponds to the entry of a matrix, and there is only one channel (gray) to encode.
You can do this with RGB channels if you'd like.
# We should convert the image (sadly) to a grayscale matrix.
gray_img = Gray.(img) # Converts to grayscale if it's not already
display(gray_img) # Display the grayscale image
gray_matrix = Float64.(gray_img) # Converts pixel values to Float64
900×900 Matrix{Float64}: 0.890196 0.886275 0.886275 0.886275 … 0.854902 0.862745 0.870588 0.901961 0.898039 0.894118 0.894118 0.870588 0.87451 0.882353 0.917647 0.913725 0.913725 0.913725 0.847059 0.85098 0.854902 0.937255 0.933333 0.929412 0.929412 0.831373 0.831373 0.835294 0.952941 0.94902 0.945098 0.941176 0.862745 0.858824 0.858824 0.960784 0.956863 0.952941 0.94902 … 0.882353 0.878431 0.87451 0.968627 0.964706 0.956863 0.952941 0.886275 0.878431 0.87451 0.968627 0.964706 0.956863 0.952941 0.898039 0.890196 0.886275 0.945098 0.945098 0.945098 0.945098 0.870588 0.878431 0.886275 0.94902 0.945098 0.945098 0.945098 0.862745 0.866667 0.87451 ⋮ ⋱ 0.980392 0.972549 0.964706 0.952941 0.835294 0.831373 0.831373 0.980392 0.968627 0.94902 0.937255 0.815686 0.823529 0.835294 0.972549 0.956863 0.929412 0.905882 0.776471 0.788235 0.796078 0.968627 0.941176 0.901961 0.87451 0.74902 0.745098 0.737255 0.960784 0.933333 0.886275 0.85098 … 0.745098 0.721569 0.701961 0.956863 0.945098 0.909804 0.882353 0.764706 0.72549 0.682353 0.933333 0.913725 0.886275 0.862745 0.784314 0.772549 0.760784 0.890196 0.87451 0.85098 0.835294 0.905882 0.917647 0.929412 0.854902 0.843137 0.827451 0.819608 0.952941 0.964706 0.972549
Supplementary Code: Color Channels¶
If you'd like to keep track of the color channels, do the following.
# Extract RGB channels
red_channel = Float64.(channelview(img)[1, :, :])
green_channel = Float64.(channelview(img)[2, :, :])
blue_channel = Float64.(channelview(img)[3, :, :])
# Display the red channel as a heatmap
heatmap(red_channel, color=:reds, aspect_ratio=:equal, title="Red Channel", yflip=true)
# Display the blue channel as a heatmap
heatmap(red_channel, color=:blues, aspect_ratio=:equal, title="Blue Channel", yflip=true)
Back to the example & SVD¶
# Making sure the type is Float64
println(typeof(gray_matrix))
size(gray_matrix)|>display
U, Σ, V = svd(gray_matrix);
println("Size of U: ", size(U))
println("Size of Σ: ", size(Σ))
println("Size of V: ", size(V))
scatter(Σ)
(900, 900)
Matrix{Float64} Size of U: (900, 900) Size of Σ: (900,) Size of V: (900, 900)
# Reconstruction test
norm(gray_matrix - (U * Diagonal(Σ) * V'))
3.940418380586986e-12
# Let's write a routine to reconstruct the image using the top k singular values
function low_rank_approximation(U, Σ, V, k)
# U_k = U[:, 1:k] # Top k columns of U
# Σ_k = Diagonal(Σ[1:k]) # Top k singular values
# V_k = V[1:k, :] # Top k rows of Vt
return U[:, 1:k] * Diagonal(Σ[1:k]) * (V[:, 1:k]') # Reconstructed matrix
end
low_rank_approximation (generic function with 1 method)
# Choose rank k for approximation
k = 2 # You can experiment with different values of k
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
So... rank-$1$ approximation is not that great. Check out the singular values.
# Choose rank k for approximation
k = 20 # You can experiment with different values of k
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
scatter(Σ, yaxis=:log)
# Choose rank k for approximation
k = 60 # You can experiment with different values of k
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
# Choose rank k for approximation
k = 150 # You can experiment with different values of k
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
It is an easy coding exercise to do this with the colored version of the image. Just split to RGB channesl, run SVD on them, and combine aprroximated matrices for the channels back.
Example 2: A Portrait of Keith Jarrett¶
Here is a Promotional photo of Keith Jarrett in Los Angeles, August 1975, for ABC/Impulse! Records.
# Load an image from a file
img = load("figs/jarrett-portrait.jpeg");
# Let's display the image
display(img)
# The image is already in grayscale but let's convert to have the correct data structure
gray_img = Gray.(img) # Converts to grayscale if it's not already
# display(gray_img) # Display the grayscale image
gray_matrix = Float64.(gray_img) # Converts pixel values to Float64
1106×935 Matrix{Float64}: 0.552941 0.568627 0.564706 0.552941 … 0.317647 0.294118 0.301961 0.560784 0.556863 0.552941 0.556863 0.313725 0.301961 0.298039 0.54902 0.537255 0.533333 0.545098 0.305882 0.294118 0.278431 0.552941 0.54902 0.552941 0.568627 0.294118 0.278431 0.270588 0.541176 0.537255 0.541176 0.556863 0.290196 0.278431 0.286275 0.541176 0.533333 0.52549 0.529412 … 0.301961 0.298039 0.317647 0.545098 0.54902 0.541176 0.541176 0.309804 0.313725 0.329412 0.533333 0.533333 0.529412 0.537255 0.305882 0.305882 0.309804 0.541176 0.533333 0.529412 0.556863 0.305882 0.298039 0.294118 0.537255 0.545098 0.517647 0.521569 0.286275 0.305882 0.313725 ⋮ ⋱ 0.960784 0.956863 0.960784 0.960784 1.0 1.0 1.0 0.964706 0.960784 0.960784 0.964706 1.0 1.0 1.0 0.968627 0.964706 0.964706 0.964706 1.0 1.0 1.0 0.968627 0.964706 0.964706 0.960784 … 1.0 1.0 1.0 0.968627 0.964706 0.960784 0.956863 1.0 1.0 1.0 0.972549 0.968627 0.960784 0.960784 1.0 1.0 1.0 0.968627 0.968627 0.964706 0.968627 1.0 1.0 1.0 0.968627 0.968627 0.968627 0.968627 1.0 1.0 1.0 0.968627 0.968627 0.968627 0.964706 … 1.0 1.0 1.0
U, Σ, V = svd(gray_matrix);
println("Size of U: ", size(U))
println("Size of Σ: ", size(Σ))
println("Size of V: ", size(V))
scatter(Σ)
Size of U: (1106, 935) Size of Σ: (935,) Size of V: (935, 935)
scatter(Σ, yaxis=:log)
# Choose rank k for approximation
display(length(Σ))
k = 20 # You can experiment with different values of k
display(k)
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
935
20
# Choose rank k for approximation
display(length(Σ))
k = 80 # You can experiment with different values of k
display(k)
approx_matrix = low_rank_approximation(U, Σ, V, k)
approx_img = Gray.(approx_matrix)
display(gray_img)
display(approx_img)
935
80
Thinking & Reading Exercise¶
How to choose $k$ in general?
Audio Compression¶
One can apply a similar idea to compress songs, although at first sight audio files are not matrices at first sight.
using WAV, DSP
# Load the audio file
signal, fs = wavread("audios/sample-song.wav")
size(signal)
(9778440, 2)
# Generate the time vector based on the sampling frequency
time = collect(0:length(signal)-1) / fs;
# Downsample for plotting
# factor = 100 # Downsample factor (adjust as needed)
# signal_downsampled = signal[1:factor:end, :];
# time_downsampled = time[1:factor:end];
# Smaller signal for plotting
time_small = time[50000:55000];
signal_small = signal[50000:55000,:];
size(signal_small)
(5001, 2)
# Plot the signal
plot(time_small, signal_small[:, 1], label="Left Channel", xlabel="Time (s)", ylabel="Amplitude", title="Waveform of Audio File")
plot!(time_small, signal_small[:, 2], label="Right Channel", xlabel="Time (s)", ylabel="Amplitude", title="Waveform of Audio File")
# display(p)
The idea is to create a spectogram of the song. This shows the amplitude packed in each frequency over time.
# If stereo, take one channel (for simplicity)
if size(signal, 2) == 2
signal_ch1 = signal[:, 1]
signal_ch2 = signal[:, 2]
end
9778440-element Vector{Float64}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
# Define parameters for the STFT
window_length = 1024
hop_size = 512
# Use a rectangular window
rectangular_window = ones(window_length)
# Compute the STFT
spectrogram_ch1_matrix = stft(signal_ch1, window_length, hop_size, window=rectangular_window)
spectrogram_ch2_matrix = stft(signal_ch2, window_length, hop_size, window=rectangular_window)
# Compute frequency axis in Hz
frequencies = (0:(window_length ÷ 2)) .* (fs / window_length)
# Compute the magnitude
mag_spec_ch1 = abs.(spectrogram_ch1_matrix)
mag_spec_ch2 = abs.(spectrogram_ch2_matrix)
# Convert magnitude to dB (adding a small value to avoid log(0))
mag_spec_ch1_db = 20 .* log10.(mag_spec_ch1 .+ 1e-6)
mag_spec_ch2_db = 20 .* log10.(mag_spec_ch2 .+ 1e-6)
513×19097 Matrix{Float64}: -120.0 -120.0 -120.0 -120.0 -120.0 … -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 … -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 ⋮ ⋱ ⋮ -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 … -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 … -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0 -120.0
ch=1
p1 = heatmap(
1:size(mag_spec_ch1_db, 2), # Time bins (x-axis)
frequencies, # Frequency bins in Hz (y-axis)
mag_spec_ch1_db,
xlabel="Time (frames)",
ylabel="Frequency (Hz)",
title="Spectrogram (dB) - Channel $ch",
color=:viridis,
clims=(-80, 0)
)
ch=2
p1 = heatmap(
1:size(mag_spec_ch2_db, 2), # Time bins (x-axis)
frequencies, # Frequency bins in Hz (y-axis)
mag_spec_ch1_db,
xlabel="Time (frames)",
ylabel="Frequency (Hz)",
title="Spectrogram (dB) - Channel $ch",
color=:viridis,
clims=(-80, 0)
)
mag_spec_ch1_db|>size
(513, 14237)
You see, the spectogram is a matrix $X$. A column $X_{:,j}$ denotes the amplitudes of each of the frequencies at time $t=j$. The plot you see is the amplitude values converted to decibels for better interpretability.
One can perform SVD on the spectogram (on the original amplitude spectorgram).
# Apply SVD
U1, Σ1, V1 = svd(mag_spec_ch1)
U2, Σ2, V2 = svd(mag_spec_ch2)
SVD{Float64, Float64, Matrix{Float64}, Vector{Float64}} U factor: 513×513 Matrix{Float64}: -0.125943 -0.138535 0.0299977 … -2.57188e-6 -2.28305e-6 -0.44015 -0.776157 -0.387917 2.31554e-7 -3.73632e-7 -0.312963 -0.266814 0.747508 -1.18208e-6 7.14409e-7 -0.276924 0.0760494 0.379091 1.18434e-6 -2.04225e-6 -0.219252 0.0837878 0.0853054 2.3926e-7 2.15157e-6 -0.210736 0.109135 0.0165572 … 8.90096e-7 -5.72513e-7 -0.181855 0.121533 -0.0171551 -4.20407e-7 1.67382e-7 -0.163883 0.0931894 -0.0275654 -5.82415e-7 -3.25962e-7 -0.201392 0.131407 -0.0776384 1.82151e-7 -1.66859e-7 -0.163593 0.111007 -0.0464829 1.36377e-6 -5.82807e-7 ⋮ ⋱ -0.00112445 0.000238444 0.000160814 0.113854 -0.0436777 -0.00112318 0.000237159 0.000161231 … 0.169877 -0.0871073 -0.00112378 0.000239329 0.000161367 0.0203574 -0.0841357 -0.00112399 0.00024085 0.000163119 0.0241568 -0.081486 -0.00112486 0.000241284 0.000164178 -0.0681521 -0.0380256 -0.00112275 0.000240752 0.000161929 -0.141956 -0.000158255 -0.00112416 0.000239285 0.000162263 … -0.134698 0.0187845 -0.00112359 0.00024004 0.000162557 -0.263298 0.0570776 -0.00112289 0.000237295 0.000158454 -0.058247 0.00833338 singular values: 513-element Vector{Float64}: 14288.579244992086 7798.471982984138 5375.94532786679 4004.233277757783 3344.622271753184 3020.450569988774 2899.286693532435 2614.6313652991767 2465.2499415645325 2242.7869333583108 ⋮ 0.27286517068905786 0.26983830811877085 0.25976819092899195 0.24919227450594086 0.23041657367351037 0.211089043391825 0.189411832239089 0.18425648155784624 0.1469558006070568 Vt factor: 513×14237 Matrix{Float64}: -4.16323e-6 -6.80311e-5 -0.00626231 … -4.89296e-5 -5.16184e-5 4.97337e-6 4.58278e-5 0.000807556 -4.98648e-5 -5.72056e-5 -9.74168e-8 1.38457e-5 0.00618152 3.83958e-5 2.59599e-5 -9.58853e-6 -0.000152718 -0.00564376 -2.99495e-5 -2.97526e-5 4.57818e-6 9.51004e-5 0.00194206 2.22664e-5 9.72614e-6 -1.93464e-5 -0.000248141 0.0124601 … 5.52566e-5 5.80719e-5 2.68262e-6 2.3101e-5 0.000683161 -4.81168e-5 -3.08088e-5 -3.91379e-6 -2.38008e-5 -0.00208149 6.55913e-7 3.37201e-6 1.01144e-6 -4.25419e-5 -0.00101979 2.1123e-5 1.30588e-6 3.02512e-7 1.08376e-5 -0.00298823 -1.25123e-5 -3.87185e-5 ⋮ ⋱ ⋮ -0.000171249 -0.00065375 0.00441791 0.00039735 -0.000415834 -9.14279e-6 -1.08774e-6 0.00356611 … -0.000393637 -0.00050148 0.00067917 -0.00124366 -0.00187179 0.000874936 0.00088496 -0.00104631 -0.000893794 -0.00366621 -0.000309179 0.0011227 0.000659634 -0.000645145 0.000881857 -2.45433e-5 3.17528e-5 0.000101824 0.00140672 -0.00393283 -0.00040379 -0.000553249 -0.000182931 0.000336498 -0.00506533 … 0.000369143 0.0015894 -0.000705653 0.000200808 -0.00441646 -0.00108947 0.000950259 -0.00155294 0.000377063 -0.00258338 6.45444e-5 0.0024139
# Check reconstruction
norm(mag_spec_ch1 - (U1*Diagonal(Σ1)*V1'))|>display
norm(mag_spec_ch2 - (U2*Diagonal(Σ2)*V2'))|>display
1.2478581880675971e-10
1.8788573655851484e-10
scatter(Σ1)
# Choose rank k for approximation
display(length(Σ))
k = 200 # You can experiment with different values of k
display(k)
approx_matrix_ch1 = low_rank_approximation(U1, Σ1, V1, k)
approx_matrix_ch2 = low_rank_approximation(U2, Σ2, V2, k)
935
200
513×14237 Matrix{Float64}: 0.00283331 0.241185 20.7069 28.2846 … 0.318758 0.16976 0.00403869 0.215964 22.394 56.3362 0.528413 0.634764 0.00446821 0.148347 33.5927 150.54 0.329872 0.341527 0.00538502 0.175066 48.6376 228.444 0.467405 0.397562 0.00550176 0.201721 56.0723 9.78594 0.179187 0.208917 0.00607168 0.0948565 45.8744 44.0422 … 0.107496 0.217965 0.00648448 0.0905741 23.1877 28.8119 0.122 0.0290338 0.00681259 0.124919 9.33198 39.3163 0.104432 0.0668273 0.00737487 0.146693 14.1109 28.2205 0.0508207 0.117869 0.00798255 0.12157 12.2272 27.112 0.0487166 0.0818487 ⋮ ⋱ ⋮ 0.00123378 0.00875555 0.104764 0.23844 0.000624025 0.00064167 0.00123578 0.00880708 0.104354 0.238665 … 0.000625061 0.000641757 0.00123929 0.00877021 0.105429 0.237857 0.000623739 0.000641299 0.00123492 0.00877966 0.103722 0.238421 0.000620088 0.00064096 0.00123857 0.00878005 0.105254 0.238274 0.000621297 0.000640669 0.00123611 0.00875183 0.104342 0.238151 0.00062167 0.000639449 0.00122917 0.00878595 0.104174 0.237879 … 0.000621869 0.000640826 0.00123693 0.00878088 0.104994 0.238786 0.000623753 0.00064173 0.00124383 0.00880041 0.103948 0.236722 0.000622207 0.000640899
Now we have a lower rank approximation of the amplitude spectogram for the left channel and the right channel. We do not have the phase information for the spectogram --- remember, the output of the Fourier transform (FFT) is complex-valued and we took the modulus to work with amplitudes of the frequencies.
There is a well-known algorithm called Griffin-Lim Algorith to reconstruct the phase information from the amplitude spectogram. It is an iterative algorithm.
Once we have a good approximation of the phase information for the approximate spectograms, we can compute the inverse STFT (short-time Fourier transform) to reconstruct the signal (i.e., the song's left and right stereo channels).
Exercise: Do this and have a listen. Can your ear detect a lower quality? Play with the values of $k$.
Movie Recommendation¶
This is an important example of the general problem of matrix completion (see "The Netflix Problem"). SVD plays a role in this! I will provide more details later, but basically one deals with a matrix of users
vs. movies
where each entry of the matrix corresponds to the rating of a user for a movie. Of course not every user has rated every movie, hence the problem of completion.
Completion of the matrix means that you can guess which movies a user would rank highly and recommend those movies to the user.
The actual algorithms at play are complicated. We will provide a straightforward and simple approach later. What we do is called collaborative filtering, where users' past experience is the only data taken into account in order to design a recommender. Movie date, actors, genre, production year, keywords in the plot, etc. are not taken into account.
Below is a toy rating matrix.
# Toy user-movie rating matrix (missing entries are NaN)
ratings = [5.0 4.0 NaN 1.0;
4.0 NaN NaN 1.0;
1.0 1.0 NaN 5.0;
NaN 4.0 5.0 NaN]
Natural Language Processing¶
This usually goes by the name Latent Semantic Analysis.
We will explain how SVD helps uncover latent relationships between words and documents in a term-document matrix, improving text similarity measures.
A term-document matrix (TDM) is a mathematical representation of a collection of documents using a two-dimensional matrix. The rows of the matrix represent documents, and the columns represent terms (or vice versa). The values in the cells of the matrix indicate how often a term appears in a document.
One can use this framework to extract topics from a collection of news articles, for example.
Anomaly & Outlier Detection¶
This lies in the general problem of detecting low-rank structures in data. A straightforward (and maybe too simple) approach using directly SVD oprates under the assumption that anomalies (e.g., outliers or noise) manifest themselves in the small singular values of a data matrix.
using LinearAlgebra, Random, Statistics
# Generate a synthetic dataset (rows: samples, columns: features)
Random.seed!(42)
data = vcat(andn(200, 8), randn(2, 8) .+ 30) # Add anomalies as outliers
202×8 Matrix{Float64}: -0.363357 -0.601298 2.3835 … 1.45804 -0.608658 -0.477566 0.251737 0.0701926 0.00537831 2.56047 -0.219474 0.479317 -0.314988 -1.78484 0.604078 -0.493555 -1.6605 -1.37539 -0.311252 -0.166235 1.32398 -1.2237 1.14809 0.118811 0.816307 -1.74545 -0.804142 0.142367 0.481794 0.24747 0.476738 -0.980311 -1.16551 … 0.41168 -0.834277 -0.143855 -0.859555 1.16886 -0.924623 -0.746196 0.00665397 -2.19211 -1.46929 0.920525 2.30765 -0.15078 -0.184968 1.06116 -2.11433 0.460454 1.02755 0.387472 0.193728 0.677708 0.0437817 0.253296 -1.61711 0.910684 -0.429355 0.913604 ⋮ ⋱ ⋮ -1.11375 -1.19793 -0.570334 0.770933 0.324815 0.639617 -0.807109 -1.09495 0.322534 0.941876 0.0392932 -1.4832 -0.80858 0.137801 -2.20824 … -1.34342 -0.632243 0.0311515 1.79954 0.873208 0.0941845 -1.03675 -0.944743 -0.55857 -0.844068 0.0397047 0.489248 -0.925045 -0.182594 -0.643691 -1.52932 -0.415943 -1.55404 -1.70447 -0.423348 -0.301803 -0.195099 0.203785 -0.721853 -1.07804 1.60744 0.721603 31.0189 30.4272 30.5409 … 30.4282 30.6282 29.4739 29.1471 30.4199 31.1656 31.2442 29.7624 29.6225