5 Eigenvectors and eigenvalues

5.1 Motivation

5.1.1 Introduction

Generally, dealing with \(n\) dimensions is hard once \(n\) gets to be too large to visualise. It is usually nice to try to work in as few dimensions as possible. For example, diagonal matrices are much easier to deal with than general matrices; general \(n\times n\)-matrices involve visualising in \(n\) dimensions, whereas problems involving diagonal \(n\times n\)-matrices can often be viewed as \(n\) separate \(1\)-dimensional problems.

In general, multiplying a vector \(\mathbf{u}\) by a matrix \(A\) produces another vector \(\mathbf{v}\). We say that ‘\(\mathbf{u}\) is mapped to \(\mathbf{v}\) by \(A\)’. In order to restrict attention to 1 dimension, it would be useful if \(\mathbf{v}\) were to be a multiple of the vector \(\mathbf{u}\) which we started with. Then any problem involving \(A\) could be viewed just “in the direction of \(\mathbf{u}\)”, and we could think of the problem in this dimension only, and hope that this would help with the general problem.

So non-zero vectors \(\mathbf{u}\) with the property that \[A\mathbf{u}=\lambda\mathbf{u}\] where \(\lambda\) is a scalar are particularly interesting. For such vectors, \(A\) maps \(\mathbf{u}\) onto a multiple of itself. The vector \(\mathbf{u}\) is called an eigenvector of \(A\), and \(\lambda\) is the corresponding eigenvalue.

Eigenvalues and eigenvectors arise in many areas of mathematics, e.g., simultaneous ODEs, mechanical vibrations, numerical analysis, quantum mechanics, data science, and many others.

For example, when studying vibrations of waves, the eigenvalues and eigenvectors describe the frequency and mode of vibration; in mechanics they represent principal stresses and principal axes of stress; we will see an application later, to the solution of systems of differential equations; in data science, the spread of multivariate data is measured by a variance matrix, and the eigenvectors and eigenvalues of this matrix give a good measure of the “shape” of the data.

5.1.2 Google

How might we measure the importance of a web site? We might decide that the importance of a web site is proportional to the number of web sites that link to it. But this isn’t a particularly clever thing to do because some web sites are linked to by only a few other sites, but important ones, while other web sites are linked to by many other sites, but unimportant ones.

Instead, we’ll regard a site as important if other important web sites link to it. Suppose \(x_1\), \(x_2\),…,\(x_n\) are the importances of all \(n\) of the web sites in the world. We want the importance of each web site to be proportional to the sum of the importances of all of the sites that link to it. That’s a system of equations that might look like this: \[\begin{aligned} x_1&=\kappa(x_{12}+x_{233}+x_{3245353})\\ x_2&=\kappa(x_1+x_{1545}+x_{95246}+x_{1467372})\\ x_3&=\cdots\end{aligned}\] Now we can imagine a huge \(n\times n\)-matrix \(A\) whose \(ij\)th entry is \(1\) is web site \(j\) links to web site \(i\), and is otherwise \(0\). Then the equations above become \[x_i=\kappa\sum_{j=1}^na_{ij}x_j\] for \(i=1,\ldots,n\), or, in matrix terms, \(\mathbf{x}=\kappa.A\mathbf{x}\), where \(\mathbf{x}=\begin{pmatrix}x_1\\x_2\\\vdots\\x_n\end{pmatrix}\). So this is also an eigenvector problem – the vector of importances is an eigenvector of the matrix \(A\). Since all the importances might be taken to be positive, we could look for all eigenvectors of \(A\), and choose one with all components positive. Then the most important web site would be the one with the largest entry in that eigenvector, etc.

This is the main idea of how Google measures the importance of a web site, using the links to it from other sites; it forms a huge matrix with one row and column for every web page in existence, and writes a \(1\) in the \(i\)th row and \(j\)th column if there is a link from the \(i\)th web page to the \(j\)th, and \(0\) otherwise. This gives a square matrix with many billion rows and columns. There is an eigenvector of this matrix which measures the importance of each page.

Similar algorithms are used by Amazon or Netflix etc., to present ordered results of a search.

5.1.3 League tables

The same sort of thing can be used to predict the ranking of sports teams, for example.

There are several problems with looking at traditional league tables part of the way through the season:

Teams may have played different numbers of games;
Even if two teams have played the same number of games, one may have played teams which seem weaker than the other.

We can get round these problems using similar ideas to above.

If \(x_i\) is the “strength” of the \(i\)th football team (say), we might decide that \(x_i\) should be proportional to the sum of the strengths of all the teams that team \(i\) defeated. Then we get the same sort of equations as above, where \(a_{ij}\) is \(1\) if team \(i\) defeated team \(j\) and \(0\) otherwise. Again, you would solve the equations, and try to look for an eigenvector whose components were all positive. The largest entry would tell you the strongest team, etc. Clearly we can refine this simple \(0/1\) model; we could give extra weight to games won away from home; to wins where the margin of victory is large; minor credit to drawn games, etc.

5.2 Finding eigenvalues and eigenvectors

5.2.1 The characteristic polynomial

As above, we have:

Definition 5.1 Suppose that \(A\) is a square matrix. A non-zero vector \(\mathbf{u}\) such that \(A\mathbf{u}=\lambda\mathbf{u}\), for some scalar \(\lambda\), is called an eigenvector; the scalar \(\lambda\) is the corresponding eigenvalue.

How do we find \(\lambda\) and \(\mathbf{u}\)?

We need to solve \(A\mathbf{u}=\lambda\mathbf{u}\). Notice first that the right-hand side can be written \(\lambda.I\mathbf{u}\), where \(I\) is the identity matrix. Then we can rewrite our equation as \[\begin{equation} (A-\lambda I)\mathbf{u}=0. \tag{5.1} \end{equation}\] We need non-zero vectors \(\mathbf{u}\) which solve this. Suppose that \(\mathbf{u}=\begin{pmatrix}u_1\\\vdots\\u_n\end{pmatrix}\). Then, when you think about what the left-hand side of the equation is, it becomes \(u_1\) multiplied by the first column of \(A-\lambda I\), plus \(u_2\) times the second column of \(A-\lambda I\), plus…, plus \(u_n\) times the final column of \(A-\lambda I\). So this is a linear relationship between the columns of \(A-\lambda I\)! For a non-trivial linear relationship to exist, we need that \(A-\lambda I\) should not be invertible, i.e., that its determinant should be \(0\).

Hence for a non-trivial solution we must have \[\begin{equation} \det(A-\lambda I)=0. \tag{5.2} \end{equation}\] So we can find eigenvalues by looking at equation (5.2); for each solution, we can solve equation (5.1) to get the corresponding eigenvector(s).

Definition 5.2 The expansion of (5.2) is called the characteristic polynomial of \(A\).

Example 5.1 Take \(A=\begin{pmatrix} -2&1\\ 0&1\end{pmatrix}\). What are its eigenvalues and eigenvectors?

\[A-\lambda I =\begin{pmatrix} -2&1\\ 0&1 \end{pmatrix}-\lambda\begin{pmatrix} 1&0\\ 0&1 \end{pmatrix}=\begin{pmatrix} -2-\lambda&1\\ 0&1-\lambda \end{pmatrix}\] so we need to solve \[\det(A-\lambda I)=\begin{vmatrix} -2-\lambda&1\\ 0&1-\lambda \end{vmatrix} =(-2-\lambda)(1-\lambda)=0.\] It follows that \((\lambda+2)(\lambda-1)=0\), and so \(\lambda=-2\) or \(1\).

When \(\lambda=1\), \(A-\lambda I=\begin{pmatrix}-3&1\\0&0\end{pmatrix}\), and so \[(A-I)\mathbf{u}=\begin{pmatrix}-3&1\\0&0\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}\] implies that \(-3x+y=0\) or \(3x=y\). We choose \(x\) arbitrary, \(x=k\) say, and then \(y=3k\). So \(\mathbf{u}=k\begin{pmatrix}1\\3\end{pmatrix}\).

When \(\lambda=-2\), \(A+2I=\begin{pmatrix}0&1\\0&3\end{pmatrix}\), and we need \[(A+2I)\mathbf{u}=\begin{pmatrix}0&1\\0&3\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}\] which happens if \(\mathbf{u}=k\begin{pmatrix}1\\0\end{pmatrix}\).

Note that if \(\mathbf{u}\) is a solution to \(A\mathbf{u}=\lambda\mathbf{u}\), then so is any multiple. So if \(\begin{pmatrix}2\\1\end{pmatrix}\) is an eigenvector, so is any multiple, such as \(\begin{pmatrix}4\\2\end{pmatrix}\), and \(\begin{pmatrix}-2\pi\\-\pi\end{pmatrix}\). Normally one writes down any choice of vector, and it is understood that this implicitly implies that all multiples are also eigenvectors.

However, one could make a particular choice so that the vector has length 1. In the example above, we would have \(\begin{pmatrix}2/\sqrt{5}\\1/\sqrt{5}\end{pmatrix}\). There are a number of applications (we shall see some later) where this normalisation is a very useful choice. Mostly, however, the precise scaling factor isn’t important.

You can check that \[\begin{aligned} A\begin{pmatrix}1\\3\end{pmatrix}&=1.\begin{pmatrix}1\\3\end{pmatrix}\\ A\begin{pmatrix}1\\0\end{pmatrix}&=-2.\begin{pmatrix}1\\0\end{pmatrix}.\end{aligned}\]

For a general \(2\times2\) matrix, \(A=\begin{pmatrix}a&b\\c&d\end{pmatrix}\), the characteristic polynomial is \[\det(A-\lambda I)=\begin{vmatrix} a-\lambda&b\\ c&d-\lambda \end{vmatrix}=(a-\lambda)(d -\lambda)-bc=(ad-bc)-(a+d)\lambda+\lambda^2,\] a quadratic equation in \(\lambda\) (generally, you should convince yourself that the characteristic polynomial of an \(n\times n\)-matrix is of degree \(n\)).

From properties of quadratics we know that \(a+d\) is the sum of the roots. This is the sum of the diagonal elements of \(A\) and called trace of \(A\). So the trace of \(A\), as well as being the sum of the main diagonal of \(A\), is also the sum of the eigenvalues.

Also, \(ad-bc\) is the product of roots, and we recognise this as the determinant \(\det A\). So \(\det A\) is also the product of the eigenvalues, \(\lambda_1\lambda_2\).

These results generalise to \(n\times n\)-matrices \(A\); the trace of \(A\) is the sum of the eigenvalues, and the determinant of \(A\) is the product. The latter is easy to show in general:

Proposition 5.1 Let \(A\) be a square matrix. Then \(\det A\) is the product of the eigenvalues of \(A\).

Proof. We recall that the eigenvalues are the roots of the characteristic polynomial. So, if \(\lambda_1,\ldots,\lambda_n\) denote these roots, we have \[\det(A-\lambda I)=(\lambda_1-\lambda)\ldots(\lambda_n-\lambda).\] Now we simply put \(\lambda=0\), to see that \[\det A=\lambda_1\cdots\lambda_n\] as required.

In general, the characteristic polynomial has \(n\) complex roots, even if \(A\) is a real \(n\times n\)-matrix, and the roots may not necessarily be distinct.

5.2.2 Eigenvalues of diagonal, upper(lower) triangular matrices

Example 5.2 Let’s start with a diagonal matrix, such as \(A=\begin{pmatrix} 1&0&0\\ 0&3&0\\ 0&0&-2 \end{pmatrix}\). The eigenvalues are computed as usual from the characteristic polynomial \[\det(A-\lambda I)=\begin{vmatrix} 1-\lambda&0&0\\ 0&3-\lambda&0\\ 0&0&-2-\lambda \end{vmatrix}=0.\] As this is diagonal, the determinant is just the product of the diagonal entries, so we get \[(1-\lambda)(3-\lambda)(-2-\lambda)=0,\] and the eigenvalues are then \(1\), \(3\) and \(-2\). These are exactly the elements on the main diagonal of \(A\).

Exactly the same happens for upper or lower triangular matrices, since their determinants are also formed from the product of the diagonal elements.

5.2.3 Examples

Example 5.3 Find the eigenvalues of \(A=\begin{pmatrix}0&1\\-1&0\end{pmatrix}\).

We note that the trace of \(A\) is \(0\), and the determinant is \(1\), and as these are the product and sum of the two eigenvalues, we see that the characteristic polynomial is \(\lambda^2+1=0\). (Of course, one should get the same result by expanding it – and you should confirm this!)

Thus the eigenvalues are \(i\) and \(-i\). Continue by finding the eigenvectors: these are two vectors with complex entries.

Example 5.4 Find the eigenvalues and eigenvectors of \(A=\begin{pmatrix}1&1\\0&1\end{pmatrix}\).

It is easy to see that the characteristic polynomial is \((\lambda-1)^2=0\), so there is a repeated eigenvalue of \(1\).

Something new happens here: if we solve \((A-I)\mathbf{u}=0\), we find only one eigenvector, namely \(\begin{pmatrix}1\\0\end{pmatrix}\), even though the eigenvalue is a double root of the characteristic polynomial.

Example 5.5 Find the eigenvalues and eigenvectors of \(A=\begin{pmatrix} 3&0&0\\ 1&2&6\\ 3&1&1 \end{pmatrix}\).

We have \[\begin{aligned} \det(A-\lambda I)&=\begin{vmatrix} 3-\lambda&0&0\\ 1&2-\lambda&6\\ 3&1&1-\lambda \end{vmatrix}\\ &=(3-\lambda)[(2-\lambda)(1-\lambda)-6]\\ &=(3-\lambda)(\lambda^2-3\lambda-4)=(3-\lambda)(\lambda-4)(\lambda+1)=0,\end{aligned}\] so the eigenvalues are \(\lambda=-1\), \(3\) and \(4\). (Note that \(\mathrm{trace}(A)=3+2+1=6=-1+3+4\), as expected.) One easily verifies (as above) that the corresponding eigenvectors are: \(\begin{pmatrix}0\\-2\\1\end{pmatrix}\) (for \(\lambda=-1\)), \(\begin{pmatrix}-1\\5\\1\end{pmatrix}\) (for \(\lambda=3\)) and \(\begin{pmatrix}0\\3\\1\end{pmatrix}\) (for \(\lambda=4\)).

5.3 Linear independence of eigenvectors

5.3.1 Linear independence again

Proposition 5.2 If \(\lambda_1,\lambda_2,\ldots,\lambda_n\) are distinct eigenvalues then the corresponding eigenvectors \(\mathbf{v}_1\), \(\mathbf{v}_2\),…, \(\mathbf{v}_n\) are linearly independent.

Proof. Assume \[\begin{equation} c_1\mathbf{v}_1+c_2\mathbf{v}_2=0, \tag{5.3} \end{equation}\] and we will show that \(c_1=c_2=0\). For this, we simply apply \(A\), and use the eigenvector property: \[\begin{equation} A(c_1\mathbf{v}_1+c_2\mathbf{v}_2)=c_1\lambda_1\mathbf{v}_1+c_2\lambda_2\mathbf{v}_2=0. \tag{5.4} \end{equation}\] Now consider (5.4)\(-\lambda_1\)(5.3): \(c_2(\lambda_2-\lambda_1)\mathbf{v}_2=0\). So if \(\lambda_1\ne\lambda_2\), we must have \(c_2=0\), and then it follows that \(c_1=0\).

However, in the case of equal roots we do not necessarily have two linearly independent eigenvectors, as we saw in Example 5.4.

If \(\lambda\) is a (complex) root of the characteristic polynomial of multiplicity \(d\ge1\), then there is at least one eigenvector of \(A\) with eigenvalue \(\lambda\) (as explained above), and the number of linearly independent eigenvectors with eigenvalue \(\lambda\) is at most \(d\) (but may be anything between \(1\) and \(d\)).

Example 5.6 Find the eigenvalues and eigenvectors for \(A=\begin{pmatrix} 1&0&0\\ 0&2&-1\\ 0&-1&2 \end{pmatrix}\).

As usual, consider \(A-\lambda I=\begin{pmatrix} 1-\lambda&0&0\\ 0&2-\lambda&-1\\ 0&-1&2-\lambda \end{pmatrix}\). We need its determinant; expanding down the first column makes this easy, and we get: \[\det(A-\lambda I)=(1-\lambda)[(2-\lambda)^2-1]=(1-\lambda)(3-\lambda)(1-\lambda)=0\] so that the eigenvalues are \(1\) (repeated twice) and \(3\).

The usual calculation gives \(\begin{pmatrix}0\\1\\-1\end{pmatrix}\) as the eigenvector when \(\lambda=3\). For \(\lambda=1\), we need to look at \[(A-I)\mathbf{v}=\begin{pmatrix} 0&0&0\\ 0&1&-1\\ 0&-1&1 \end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix}\] and it is easy to see that \(x\) is arbitrary and \(y=z\) is arbitrary. So the general eigenvector is \[\begin{pmatrix}\alpha\\\beta\\\beta\end{pmatrix}=\alpha\begin{pmatrix}1\\0\\0\end{pmatrix}+\beta\begin{pmatrix}0\\1\\1\end{pmatrix},\] so we get 2 linearly independent eigenvectors for the repeated eigenvalue in this example.

5.3.2 An important of consequence of linear independence

We illustrate the idea, which will be developed and used further in future, using an easy example.

Suppose we have a matrix with a full set of linearly independent eigenvectors. For example, we might have an \(n\times n\)-matrix with \(n\) distinct eigenvalues. Then, since we are working with vectors of length \(n\), i.e., in \(\mathbb{R}^n\), we can write any vector as a linear combination of the eigenvectors. For example, if \(A\) is the \(2\times2\)-matrix \(A=\begin{pmatrix}-2&1\\0&1\end{pmatrix}\), we found an eigenvector \(\mathbf{v}_1=\begin{pmatrix}1\\3\end{pmatrix}\) with eigenvalue \(\lambda_1=1\), and an eigenvector \(\mathbf{v}_2=\begin{pmatrix}1\\0\end{pmatrix}\) with eigenvalue \(\lambda_2=-2\).

Consider any vector \(\mathbf{v}=\begin{pmatrix}x\\y\end{pmatrix}\) and write this as a linear combination of \(\mathbf{v}_1\) and \(\mathbf{v}_2\): \[\begin{pmatrix}x\\y\end{pmatrix}=\alpha\begin{pmatrix}1\\3\end{pmatrix}+\beta\begin{pmatrix}1\\0\end{pmatrix};\] this is easily solved by \(\alpha=y/3\) (look at the bottom component) and \(\beta=x-y/3\).

Then we can work out \[\begin{aligned} A\begin{pmatrix}x\\y\end{pmatrix}&=\alpha A\begin{pmatrix}1\\3\end{pmatrix}+\beta A\begin{pmatrix}1\\0\end{pmatrix}\\ &=\alpha\lambda_1\begin{pmatrix}1\\3\end{pmatrix}+\beta\lambda_2\begin{pmatrix}1\\0\end{pmatrix}\\ &=\alpha\begin{pmatrix}1\\3\end{pmatrix}-2\beta\begin{pmatrix}1\\0\end{pmatrix}.\end{aligned}\] For some problems, we can see what happens for each of the eigenvectors – and since the matrix just maps an eigenvector to a multiple of itself, this is hopefully easier – and then combine them to get the general case.

5.4 Matrices with respect to different bases

5.4.1 Change of basis

We’ve already mentioned the concept of linear independence as being the critical factor in determining whether systems of \(n\) equations in \(n\) variables have a unique solution or not – if the \(n\) rows of the matrix of coefficients of the equations are linearly independent, then there is a unique solution, and otherwise there is either none or infinitely many.

A linearly independent set of \(n\) vectors in \(\mathbb{R}^n\) is also known as a basis. The standard basis is given by \(\mathbf{e}_1=\begin{pmatrix}1\\0\\\vdots\\0\end{pmatrix}\), …, \(\mathbf{e}_n=\begin{pmatrix}0\\\vdots\\0\\1\end{pmatrix}\).

It turns out that every vector in \(\mathbb{R}^n\) can be written in terms of a linearly independent set. (This is fairly easy to see – the result of a complete Gaussian elimination on such a set consists of the standard basis vectors, which means that each standard basis vector must be expressible in terms of the set, and of course every vector can be written in terms of the standard basis.)

When we write down a matrix, we are really using the standard basis. So the matrix \(\begin{pmatrix}1&3\\5&7\end{pmatrix}\) has corresponding function equal to \[x\mathbf{e}_1+y\mathbf{e}_2=\begin{pmatrix}x\\y\end{pmatrix}\mapsto\begin{pmatrix}x+3y\\5x+7y\end{pmatrix}=(x+3y)\mathbf{e}_1+(5x+7y)\mathbf{e}_2.\] In particular, \(\mathbf{e}_1\) gets sent to \(\mathbf{e}_1+5\mathbf{e}_2\), and \(\mathbf{e}_2\) gets sent to \(3\mathbf{e}_1+7\mathbf{e}_2\).

But we can “change the basis” and use other vectors instead! Let’s see what happens if we use \(\mathbf{f}_1=\begin{pmatrix}1\\1\end{pmatrix}\) and \(\mathbf{f}_2=\begin{pmatrix}1\\-1\end{pmatrix}\). Then notice that \[\begin{aligned} A\mathbf{f}_1&=\begin{pmatrix}1&3\\5&7\end{pmatrix}\begin{pmatrix}1\\1\end{pmatrix}=\begin{pmatrix}4\\12\end{pmatrix}=8\begin{pmatrix}1\\1\end{pmatrix}-4\begin{pmatrix}1\\-1\end{pmatrix}=8\mathbf{f}_1-4\mathbf{f}_2\\ A\mathbf{f}_2&=\begin{pmatrix}1&3\\5&7\end{pmatrix}\begin{pmatrix}1\\-1\end{pmatrix}=\begin{pmatrix}-2\\-2\end{pmatrix}=-2\begin{pmatrix}1\\1\end{pmatrix}+0\begin{pmatrix}1\\-1\end{pmatrix}=-2\mathbf{f}_1+0\mathbf{f}_2.\end{aligned}\] The statement that \(A=\begin{pmatrix}1&3\\5&7\end{pmatrix}\) means that \[\begin{aligned} A\mathbf{e}_1&=\mathbf{e}_1+5\mathbf{e}_2\\ A\mathbf{e}_2&=3\mathbf{e}_1+7\mathbf{e}_2.\end{aligned}\] But now we have \[\begin{aligned} A\mathbf{f}_1&=8\mathbf{f}_1-4\mathbf{f}_2\\ A\mathbf{f}_2&=-2\mathbf{f}_1+0\mathbf{f}_2,\end{aligned}\] so there is a sense in which \(A\) “is the matrix \(\begin{pmatrix}8&-2\\-4&0\end{pmatrix}\)” if we use the basis \(\mathbf{f}_1\) and \(\mathbf{f}_2\) instead of the standard basis.

Exercise 5.1 Write \(M=\begin{pmatrix}1&1\\1&-1\end{pmatrix}\). Compute \(M^{-1}AM\).

Hopefully you will see that we get the same answer as above! Notice that the columns of \(M\) are just \(\mathbf{f}_1\) and \(\mathbf{f}_2\), and so \(M\mathbf{e}_1=\mathbf{f}_1\) and \(M\mathbf{e}_2=\mathbf{f}_2\). You can view the product \(M^{-1}AM\) as the composition of: first apply \(M\), which takes the co-ordinates \(\mathbf{e}_1\), \(\mathbf{e}_2\) to the co-ordinates \(\mathbf{f}_1\), \(\mathbf{f}_2\); then apply \(A\) to \(\mathbf{f}_1\) and \(\mathbf{f}_2\); then apply \(M^{-1}\) to get back to the original co-ordinates.

5.4.2 Eigenvectors and diagonalisability

So another answer to the problem of why we want to work with eigenvectors comes with finding basis sets that make a matrix diagonal. That is, we would like to make a matrix \(\begin{pmatrix} a&b\\ c&d \end{pmatrix}\) into a diagonal one, \(\begin{pmatrix} \lambda_1&0\\ 0&\lambda_2 \end{pmatrix}\), if possible.

Suppose that \(A\) is a square \(n\times n\) matrix, and suppose that it has \(n\) linearly independent eigenvectors \(\mathbf{x}_1,\ldots,\mathbf{x}_n\) (recall that we say that these vectors form a basis), and suppose that the eigenvalue corresponding to \(\mathbf{x}_i\) is \(\lambda_i\). Let’s form the \(n\times n\) matrix \(M\) with columns which are exactly \(\mathbf{x}_1,\ldots,\mathbf{x}_n\) (it doesn’t matter whether these are normalised or not).

Now consider \(M^{-1}AM\). This is again an \(n\times n\) matrix. What is it?

Exercise 5.2 Do this for \(\begin{pmatrix}1&2\\-1&4\end{pmatrix}\).

Exercise 5.3 Do this for \(\begin{pmatrix}5&3\\-1&1\end{pmatrix}\).

Let \(\mathbf{e}_1=\begin{pmatrix}1\\0\\\vdots\\0\end{pmatrix}\).

Then \(M\mathbf{e}_1=\mathbf{x}_1\), by definition of matrix multiplication. Next, \(AM\mathbf{e}_1=A\mathbf{x}_1=\lambda_1\mathbf{x}_1\), as \(\mathbf{x}_1\) is an eigenvector with eigenvalue \(\lambda_1\).

This means that \(AM\mathbf{e}_1=\lambda_1M\mathbf{e}_1\), and so \(M^{-1}AM\mathbf{e}_1=\lambda_1\mathbf{e}_1\). That is, \[M^{-1}AM\begin{pmatrix}1\\0\\\vdots\\0\end{pmatrix}=\begin{pmatrix}\lambda_1\\0\\\vdots\\0\end{pmatrix}.\] Thinking about how matrix multiplication works, this means that the first column of \(M^{-1}AM\) is just \(\begin{pmatrix}\lambda_1\\0\\\vdots\\0\end{pmatrix}\).

In the same way, the second column of \(M^{-1}AM\) is \(\begin{pmatrix}0\\\lambda_2\\0\\\vdots\\0\end{pmatrix}\), and so on, and this means that \[M^{-1}AM=\begin{pmatrix}\lambda_1&0&\ldots&0\\0&\lambda_2&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&\lambda_n\end{pmatrix}.\] So \(M^{-1}AM\) is actually a diagonal matrix whose diagonal entries are exactly the eigenvalues!

Let’s record this as a theorem:

Theorem 5.1 Suppose that \(A\) is a square \(n\times n\) matrix with \(n\) linearly independent eigenvectors. If \(M\) denotes the matrix whose columns are the eigenvectors, then \(M^{-1}AM\) is a diagonal matrix whose entries are the eigenvalues.

We’ve already mentioned that not every \(n\times n\) matrix has \(n\) different eigenvectors. For example, \(\begin{pmatrix}1&1\\0&1\end{pmatrix}\) had just one eigenvector. We’ll just mention that in this case, one can do something a bit similar; there is a matrix \(M\) such that \(M^{-1}AM\) has all its entries \(0\), except for the eigenvalues on the main diagonal and also some entries on the diagonal just above the main diagonal which can be chosen to be \(1\). This is called the Jordan canonical form, but we shall say nothing further about it here.

Let’s observe that the characteristic polynomial of a matrix \(M^{-1}AM\) is the same as the characteristic polynomial of \(A\). Indeed, the characteristic polynomial of \(M^{-1}AM\) is given by \[\begin{aligned} \det(M^{-1}AM-\lambda I)&=\det(M^{-1}AM-\lambda M^{-1}M)=\det(M^{-1}(A-\lambda I)M)\\ &=(\det M)^{-1}\det(A-\lambda I)\det M=\det(A-\lambda I),\end{aligned}\] which is the same as the characteristic polynomial of \(A\).

You can think of first multiplying by \(M\) to change from the basis of eigenvectors to the standard basis, then applying \(A\) which gives the linear map with respect to the standard basis, and then applying \(M^{-1}\) to get back to the basis of eigenvectors.

And since the characteristic polynomial is unchanged, so are its roots, and so the eigenvalues don’t depend on the choice of a basis.

5.5 Functions of matrices

5.5.1 Eigenvalues of iterated matrices

If \(A\mathbf{v}=\lambda\mathbf{v}\), then \[A^2\mathbf{v}=\lambda A\mathbf{v}=\lambda^2\mathbf{v}\] and then \[A^3\mathbf{v}=\lambda^2A\mathbf{v}=\lambda^3\mathbf{v}\] and so on. Hence if \(\mathbf{v}\) is an eigenvector of \(A\) with eigenvalue \(\lambda\), then it is also an eigenvector of \(A^k\), but with eigenvalue \(\lambda^k\). Combining these, if \(p\) is any polynomial in \(A\), such as \(p(A)=A^3+2A^2+A+3I\), then \(\mathbf{v}\) is an eigenvector of \(p(A)\) with eigenvalue \(p(\lambda)\).

Example 5.7 If \(A=\begin{pmatrix} 3&0&0\\ 1&2&6\\ 3&1&1 \end{pmatrix}\), with has eigenvalues \(-1\), \(3\) and \(4\), then the eigenvalues of \(A^3\) are \((-1)^3\), \(3^3\) and \(4^3\), i.e. \(-1\), \(27\), \(64\), and the eigenvalues of \(A^2+2A-I\) are \((-1)^2+2(-1)-1=-2\), \(3^2+2(3)-1=14\) and \(4^2+2(4)-1=23\).

5.5.2 The Cayley-Hamilton Theorem

Let \(A\) be an \(n\times n\) matrix. Then we can form \(A^2=AA\), \(A^3=AA^2\), etc. Then if \(f\) is a polynomial such as \(f(t)=3t^3-2t^2+t-5\), we can write \(f(A)=3A^3-2A^2+A-5I\), where \(I\) is the identity matrix of the same size as \(A\).

Example 5.8 Suppose that \(f(t)=3t^2-t+1\), and \(A=\begin{pmatrix}1&-1\\2&3\end{pmatrix}\). Compute \(f(A)\).

If \(A\) is diagonal, it is easy to evaluate this. Indeed, if \[A=\begin{pmatrix}\lambda_1&0&\ldots&0\\0&\lambda_2&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&\lambda_n\end{pmatrix}\] then \[A^k=\begin{pmatrix}\lambda_1^k&0&\ldots&0\\0&\lambda_2^k&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&\lambda_n^k\end{pmatrix},\] and so if \(f(t)\) is a polynomial, \[f(A)=\begin{pmatrix}f(\lambda_1)&0&\ldots&0\\0&f(\lambda_2)&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&f(\lambda_n)\end{pmatrix}.\]

As an example, recall that if \(A\) is diagonal, then the diagonal entries are the eigenvalues. Suppose that \(f(t)\) is the characteristic polynomial. Then \(f(A)=0\), because \(f(A)\) has diagonal entries equal to \(f(\lambda_i)\), but \(f(\lambda_i)=0\) as the eigenvalues \(\lambda_i\) are roots of the characteristic polynomial. Thus \(f(A)=0\).

We conclude the following:

If \(A\) is a diagonal matrix with characteristic polynomial \(p(t)\), then \(p(A)=0\).

We have seen that there are many matrices \(A\) which are diagonalisable; that is, there is a matrix \(M\) such that \(M^{-1}AM\) is diagonal. (In particular, whenever \(A\) has distinct eigenvalues; we will also mention that it also holds whenever \(A\) is real and symmetric.)

Suppose that \(M^{-1}AM=D\) is diagonal. Then \(A=MDM^{-1}\). Note that \[A^2=MDM^{-1}.MDM^{-1}=MD^2M^{-1},\] as the middle two matrices cancel. Similarly, \[A^3=MDM^{-1}.MDM^{-1}.MDM^{-1}=MD^3M^{-1}.\] In general, \(A^k=MD^kM^{-1}\). Then \[f(A)=Mf(D)M^{-1}.\] We can work out \(f(D)\) easily, since \(D\) is diagonal, and this allows us to work out \(f(A)\) easily.

Recall that the characteristic polynomial of \(A\) is the same as the characteristic polynomial of \(M^{-1}AM=D\). If \(p(t)\) is this polynomial, then \(p(D)=0\) by the above observation, and as \(p(A)=Mp(D)M^{-1}\), we see that \(p(A)=0\) also.

We therefore see that there is a wide class of matrices which satisfy their own characteristic polynomial! In fact, this result holds for any matrix, but the proof is a little beyond this course. This is the Cayley-Hamilton theorem:

Theorem 5.2 (Cayley-Hamilton) If \(A\) is a square matrix with characteristic polynomial \(p\), then \(p(A)=0\).

5.6 Coupled linear ordinary differential equations

We can even substitute matrices into “power series”, like the exponential series \[e^t=1+t+\frac{t^2}{2!}+\frac{t^3}{3!}+\cdots,\] and it turns out that substituting a square matrix \(A\) into this series gives a valid answer: \[e^A=I+A+\frac{A^2}{2!}+\frac{A^3}{3!}+\cdots.\] This may not look very useful, but there are useful applications. Remember that the differential equation \(\dot{x}=ax\) has solution \(x=x_0e^{at}\) (where \(x_0=x(0)\)). Often in applications, you have systems involving a number of inputs and outputs which might be related by, for example: \[\begin{aligned} \dot{x}&=ax+by\\ \dot{y}&=cx+dy.\end{aligned}\] We can form the vector \(\mathbf{v}(t)=\begin{pmatrix}x(t)\\ y(t)\end{pmatrix}\), and the matrix \(A=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\), and rewrite the system as \(\dot{\mathbf{v}}=A\mathbf{v}\). This looks just like the differential equation above! And the solution is given by \(\mathbf{v}=e^{At}\mathbf{v}_0\) where \(\mathbf{v}_0=\mathbf{v}(0)=\begin{pmatrix}x(0)\\ y(0)\end{pmatrix}\). So matrix exponentials are surprisingly useful in writing solutions to systems of differential equations.

Consider \[e^{At}=I+At+\frac{1}{2!}A^2t^2+\cdots,\] and notice that \[\frac{d}{dt}(e^{At})=A+\frac{1}{2!}A^2.2t+\cdots=Ae^{At}.\] So \[\frac{d}{dt}(e^{At})=Ae^{At}.\] Some care needs to be taken with matrix exponentials – you may be able to convince yourself that \(e^{A+B}\ne e^Ae^B\) in general. Indeed, remember that \(AB\) and \(BA\) are usually different for matrices, so that \[(A+B)^2=(A+B)(A+B)=A^2+AB+BA+B^2,\] and this is generally different from \(A^2+2AB+B^2\); when you try expanding \(e^{A+B}\), you will find terms like \(AB+BA\), and when you expand \(e^Ae^B\), you will find terms like \(2AB\).

Let’s focus on the case where \(A\) is an \(n\times n\) square matrix with \(n\) independent eigenvectors. As already mentioned, if \(M\) is the matrix whose columns are the eigenvectors, then \(M^{-1}AM\) is diagonal whose diagonal entries are the eigenvalues.

Recall that \((M^{-1}AM)^k=M^{-1}A^kM\). In this case, \[\begin{eqnarray*} e^{M^{-1}AM}&=&I+M^{-1}AM+\frac{(M^{-1}AM)^2}{2!}+\cdots\\ &=&M^{-1}M+M^{-1}AM+\frac{M^{-1}A^2M}{2!}+\cdots\\ &=&M^{-1}\left(I+A+\frac{A^2}{2!}+\cdots\right)M\\ &=&M^{-1}e^AM \end{eqnarray*}\] But \(M^{-1}AM=\begin{pmatrix}\lambda_1&0&\ldots&0\\0&\lambda_2&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&\lambda_n\end{pmatrix}\), and it is easy to see that then \(e^{M^{-1}AM}=\begin{pmatrix}e^{\lambda_1}&0&\ldots&0\\0&e^{\lambda_2}&\ldots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&e^{\lambda_n}\end{pmatrix}\). So if \(A\) is diagonalisable, then so is \(e^A\), using the same matrix \(M\); if \(\lambda\) is an eigenvalue of \(A\), then \(e^\lambda\) is an eigenvalue of \(e^A\), with the same eigenvector.

This makes it easy to compute \(e^{At}\) when \(A\) is diagonalisable. We find the matrix \(M\) of eigenvectors. If \(M^{-1}AM\) is the diagonal matrix \(D\), we know \(M^{-1}e^{At}M=e^{Dt}\), and this is easy to compute – it is diagonal with entries \(e^{\lambda_it}\). Then \(e^{At}=Me^{Dt}M^{-1}\).

Extending this a little further soon leads to the following result:

Theorem 5.3 If \(A\) is an \(n\times n\) diagonalisable matrix, with eigenvalues \(\lambda_1,\ldots,\lambda_n\) and \(n\) linearly independent eigenvectors \(\mathbf{v}_1,\ldots,\mathbf{v}_n\), then the general solution of the system of linear differential equations \(\dot{\mathbf{v}}=A\mathbf{v}\) is \[\mathbf{v}=c_1e^{\lambda_1t}\mathbf{v}_1+\cdots+c_ne^{\lambda_nt}\mathbf{v}_n,\] where \(c_1,\ldots,c_n\) are arbitrary constants.

We can apply this method to linear differential equations of higher order using a little trick.

Example 5.9 Consider the differential equation \[\frac{d^3x}{dt^3}+2\frac{d^2x}{dt^2}-\frac{dx}{dt}-2x=0.\] Write \(y=\frac{dx}{dt}\) and \(z=\frac{dy}{dt}\). Then our equation becomes \[\frac{dz}{dt}+2z-y-2x=0,\] and so we have three equations \[\begin{eqnarray*} \frac{dx}{dt}&=&y,\\ \frac{dy}{dt}&=&z,\\ \frac{dz}{dt}&=&2x+y-2z. \end{eqnarray*}\] So \[\frac{d}{dt}\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}y\\z\\2x+y-2z\end{pmatrix}=\begin{pmatrix}0&1&0\\0&0&1\\2&1&-2\end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}.\] This is in the form \(\frac{d\mathbf{v}}{dt}=A\mathbf{v}\), where \(\mathbf{v}=\begin{pmatrix}x\\y\\z\end{pmatrix}\). One can compute the characteristic polynomial of \(A\) as \(t^3+2t^2-t-2=(t-1)(t+1)(t+2)\), so the eigenvalues are \(1\), \(-1\) and \(2\), and we can compute the corresponding eigenvectors as \(\begin{pmatrix}1\\1\\1\end{pmatrix}\), \(\begin{pmatrix}1\\-1\\1\end{pmatrix}\) and \(\begin{pmatrix}1\\-2\\4\end{pmatrix}\). By the Theorem, the general solution of the system is \[\mathbf{v}=c_1e^t\begin{pmatrix}1\\1\\1\end{pmatrix}+c_2e^{-t}\begin{pmatrix}1\\-1\\1\end{pmatrix}+c_3e^{-2t}\begin{pmatrix}1\\-2\\4\end{pmatrix}.\] Of course, we are only really interested in \(x(t)\), but we can read off that \[x(t)=c_1e^t+c_2e^{-t}+c_3e^{-2t}.\]