2 Simultaneous equations

2.1 Solving simultaneous equations

Consider the system of 3 equations in unknowns $x$, $y$ and $z$: \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ 4x&+&7y&+&3z&=&3&&(2)\\ 6x&+&3y&-&2z&=&-3&&(3) \end{array}\] Hopefully you all saw how to solve systems of equations like this at school.

Just as a reminder, the standard way to do this is to eliminate variables, and reduce the number of equations. So we could choose to eliminate $x$ from the last two equations by taking some appropriate multiple of the first equation from the other two. In particular, if we write (2a) for the equation we get by taking two times (1) from (2), and (3a) for what we get when we take three times (1) from (3), then we get equations: \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ &&y&+&7z&=&1&&(2a)\\ &-&6y&+&4z&=&-6&&(3a) \end{array}\] Now we try to eliminate $y$ from (3a), using (2a); we form equation (3b), which is got by adding 6 times (2a) onto (3a), and we get the system: \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ &&y&+&7z&=&1&&(2a)\\ &&&&46z&=&0&&(3b) \end{array}\] Now that this system is in triangular form, we can read off the answers easily by back-substitution. So we can read off the value of $z$ from (3b), to see that $z=0$, and then substitute this value back into (2a) to see that $y=1$. We then substitute these into (1) to see that $x=-1$.

This process of eliminating one variable at a time is called Gaussian elimination.

I don’t know about you, but when I write down loads of equations like this, I get a bit frustrated. The idea is simple enough, but I keep writing down $x$, $y$ and $z$, and also $=$, and these stay the same each time. It would be somehow just as easy to keep track of the coefficients, and to ignore all of these bits that don’t change from one line to the next.

So rather than write the original set of equations (1)–(3), we’ll just keep track of all the numbers: \[\begin{array}{rrr|r} 2&3&-2&1\\ 4&7&3&3\\ 6&3&-2&-3 \end{array}\] Then the steps above on equations can be regarded as operations on these arrays of numbers. Taking two times (1) from (2) corresponds to replacing the second row, $R_2$ say, with $R_{2a}=R_2-2R_1$, the second row minus two times the first row. Similarly, we replace $R_3$ with $R_{3a}=R_3-3R_1$, to get the new array: \[\begin{array}{rrr|r} 2&3&-2&1\\ 0&1&7&1\\ 0&-6&4&-6 \end{array}\] Notice that this has ensured that only one entry in the first column is non-zero – this is the essential part in eliminating the variable $x$.

Then the final operation replaces $R_{3a}$ by $R_{3b}=R_{3a}+6R_{2a}$, and we get the array: \[\begin{array}{rrr|r} 2&3&-2&1\\ 0&1&7&1\\ 0&0&46&0 \end{array}\] Now we just have to remember that this array is really just a shorthand for three equations, and we can read off the solution directly as above.

2.2 Linear independence again

Recall our definition from earlier:

Definition 2.1 Three vectors $\mathbf{u}$, $\mathbf{v}$ and $\mathbf{w}$ in $\mathbb{R}^3$ are linearly independent if whenever $\alpha\mathbf{u}+\beta\mathbf{v}+\gamma\mathbf{w}=0$, then $\alpha=\beta=\gamma=0$. If a non-trivial solution exists, then the vectors are linearly dependent.

Let’s imagine that we have three equations whose coefficient vectors are linearly dependent. In fact, as an example, let’s take the three equations in Example 1.15: \[\begin{aligned} x+y+z&=2\\ 2x+3y-z&=1\\ x+4z&=5.\end{aligned}\] When we write these in augmented matrix form, we get: \[\begin{array}{rrr|r} 1&1&1&2\\ 2&3&-1&1\\ 1&0&4&5 \end{array}\] Now we eliminate $x$ to get: \[\begin{array}{rrr|r} 1&1&1&2\\ 0&1&-3&-3\\ 0&-1&3&3 \end{array}\] and eliminate $y$ to get \[\begin{array}{rrr|r} 1&1&1&2\\ 0&1&-3&-3\\ 0&0&0&0 \end{array}\] The bottom line is just the equation $0x+0y+0z=0$, which is always true. So you can see that we really just have two equations in three variables, and we would expect that two equations in three variables would have infinitely many solutions – we should be able (in general) to pick any value for one of the variables, treating it as a constant, to get two equations in the other two variables, and we expect that two equations in two variables will usually have a unique answer. So we would expect a whole line of solutions.

If we altered one of the constant terms in the original set of equations, to, say: \[\begin{aligned} x+y+z&=2\\ 2x+3y-z&=1\\ x+4z&=6,\end{aligned}\] we could continue with the elimination of variables in the same way, to get: \[\begin{array}{rrr|r} 1&1&1&2\\ 0&1&-3&-3\\ 0&-1&3&4 \end{array}\] and eliminate $y$ to get \[\begin{array}{rrr|r} 1&1&1&2\\ 0&1&-3&-3\\ 0&0&0&1. \end{array}\] Now the last line says $0x+0y+0z=1$! This is obviously nonsense; clearly there are no values of $x$, $y$ and $z$ that possibly make $0=1$ true.

We can see that whenever the left-hand sides of the three equations are linearly dependent, we should be able to take some combination which makes the left-hand side of one of the equations all $0$s. So this equation now becomes $0x+0y+0z=?$; if the right-hand constant is $0$, then we basically end up with two equations in three variables, and expect infinitely many solutions, while if the right-hand constant is non-zero, the equation can never hold, and there are no solutions.

On the other hand, if the three equations are linearly independent, we should get something which looks like the system in the previous section, ending with some upper triangular system where we can back-substitute to find a unique solution.

Then

Three simultaneous equations in three variables will have a unique solution precisely when the coefficient vectors of the equations are linearly independent.

We’ll now explain a more systematic method of solving simultaneous equations.

2.3 Gaussian elimination

As we’ve already remarked, the method above is called Gaussian elimination. Let’s make it more systematic (those of you doing MAS115 might think about how to program it in Python, say), and we’ll mention some more terminology.

An array of numbers is known as a matrix. Often, if the matrix of coefficients is $A$ and the matrix of constants is called $b$, the matrix $(A|b)$ is known as an augmented matrix. Generally, matrices have brackets round them, and we’ll follow this in what follows. We’ll do a lot more about matrices in the next chapter; here, we’ll just use them as a convenient notation for abbreviating systems of equations.

At each step, we eliminate one of the variables. For this, we select (usually) the leftmost non-zero entry of some of the rows (the pivot), and subtract multiples of this row from the others to ensure that all other entries in the column become $0$.

Let’s summarise the method:

Write down the augmented matrix $(A|b)$ with rows labelled $R_1,\ldots,R_n$ corresponding to the individual equations.
Subtract multiples of $R_1$ from $R_2,\ldots,R_n$ to reduce the elements below the leading diagonal in the first column to $0$.
In the matrix obtained, consider all the rows $R_2,\ldots,R_n$, and find a row with the leftmost non-zero entry. Swap this (if necessary) with $R_2$, and subtract multiples of (the new) $R_2$ from (the new) $R_3,\ldots,R_n$ to reduce the elements below this leftmost entry in the same column to $0$.
Continue this as far as possible, until $(A|b)$ is reduced to $(H|c)$, where $H$ is upper triangular (see the remark below).
Solve the equations $(H|c)$ by back-substitution.

Remark. Generally, the term “upper triangular” is reserved for square matrices. In many examples, we want to consider $n$ equations in $n$ variables, and so our matrices will be square. But the idea above will extend to a more general situation. In this case, we say that the resulting matrices are in row echelon form. This simply means that the first non-zero entry in each row lies further to the right than the first non-zero entry in the previous row.

So, above, when we referred to “upper triangular” matrix when summarising Gaussian elimination, we really mean that $H$ should be in row echelon form.

Here’s a formal definition:

Definition 2.2 A matrix is in row echelon form if

all non-zero rows (rows with at least one non-zero element) are above any rows of all zeroes (all zero rows, if any, belong at the bottom of the matrix), and
the leading coefficient (the first non-zero number from the left, also called the pivot) of a non-zero row is always strictly to the right of the leading coefficient of the row above it.

Some people insist that the leading coefficient must be 1, but we won’t.

The two conditions imply that all entries in a column below a leading coefficient are zeros.

Example 2.1 The following two matrices are in row echelon form: \[\begin{pmatrix} 1&2&3&4&5\\ 0&0&2&1&0\\ 0&0&0&1&3\\ 0&0&0&0&0 \end{pmatrix},\qquad \begin{pmatrix}1&1&2&3&4\\0&2&1&1&-3\\0&0&0&-1&4\end{pmatrix}.\]

Example 2.2 The following matrices are not in row echelon form: \[\begin{pmatrix} 1&2&3&4&5\\ 0&1&2&3&4\\ 0&0&0&1&2\\ 0&0&0&3&4 \end{pmatrix},\qquad \begin{pmatrix} 1&2&3\\ 0&0&0\\ 0&1&2 \end{pmatrix},\qquad \begin{pmatrix} 1&2&3\\ 0&0&4\\ 0&5&6 \end{pmatrix}.\]

2.4 Examples

Let’s see some examples of how this works in practice.

Example 2.3 Solve \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ 4x&+&6y&+&3z&=&2&&(2)\\ 6x&+&3y&-&2z&=&9&&(3) \end{array}\]

We translate the equations into matrix form, as the system \[(A|b)=\begin{array}{l}R_1\\R_2\\R_3\end{array} \left( \begin{array}{rrr|r} \fbox{2}&3&-2&1\\ 4&6&3&2\\ 6&3&-2&9 \end{array} \right)\] Now we pivot on the top left entry, by adding or subtracting appropriate multiples of this row/equation to the remaining rows/equations so that the remaining entries below in the same column are $0$. We get: \[\begin{array}{l} R_1\\R_{2a}=R_2-2R_1\\{R}_{3a}=R_3-3R_1 \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&0&7&0\\ 0&-6&4&6 \end{array} \right)\] At this stage, we would normally pivot on the second entry of row $R_{2a}$, but this is $0$. So we interchange rows $R_{2a}$ and $R_{3a}$ (i.e., swap the second and third equations) to get: \[\begin{array}{l} R_1\\{R}_{2b}=R_{3a}\\{R}_{3b}=R_{2a} \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&\fbox{$-6$}&4&6\\ 0&0&7&0 \end{array} \right)\] Now this is in upper triangular form, so we can go back to the point of view of equations, and use back-substitution. The third row is the equation $7z=0$, so $z=0$. The second row is $-6y+4z=6$; as we already know $z=0$, we read off $y=-1$, and then it is easy to get $x=2$ from the top equation.

Example 2.4 Solve \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ 4x&+&6y&+&3z&=&2&&(2)\\ 6x&+&9y&-&2z&=&3&&(3) \end{array}\]

We translate the equations into matrix form, as the system \[(A|b)=\begin{array}{l}R_1\\{R}_2\\{R}_3\end{array} \left( \begin{array}{rrr|r} \fbox{2}&3&-2&1\\ 4&6&3&2\\ 6&9&-2&3 \end{array} \right)\] As before, we pivot on the first entry of row $R_1$ to get: \[\begin{array}{l} R_1\\{R}_{2a}=R_2-2R_1\\{R}_{3a}=R_3-3R_1 \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&0&\fbox{7}&0\\ 0&0&4&0 \end{array} \right)\] This matrix is already upper triangular, so we could use back substitution now, but it is generally preferable, especially with larger systems of equations, to keep going with the elimination process as far as possible.

So now we will pivot on the third entry of $R_2$: \[\begin{array}{l} R_1\\{R}_{2a}\\{R}_{3b}=R_{3a}-\tfrac{4}{7}R_{2a} \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&0&7&0\\ 0&0&0&0 \end{array} \right)\] Now the third equation is just $0x+0y+0z=0$, which is always true, so tells us nothing about $x$, $y$ or $z$. However $R_{2a}$ tells us that $z=0$, and then $R_1$ gives $2x+3y=1$. As we know, this is the equation of a line, and there is not one unique pair $(x,y)$ making this hold. So we see that there is a whole line of solutions. We can put the solutions into parametric form by assigning arbitrary constants to unknowns, and then reading off the rest – for example, we could say $x=\lambda$, and then read off that $y=\tfrac{1}{3}(1-2\lambda)$.

Then the general solution is given in the form \[\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}\lambda\\\tfrac{1}{3}(1-2\lambda)\\0\end{pmatrix}=\begin{pmatrix}0\\\tfrac{1}{3}\\0\end{pmatrix}+\lambda\begin{pmatrix}1\\-\frac{2}{3}\\0\end{pmatrix},\] a line in the direction of $\begin{pmatrix}1\\-\tfrac{2}{3}\\0\end{pmatrix}$ passing through $\begin{pmatrix}0\\\tfrac{1}{3}\\0\end{pmatrix}$,

Note that in this example, row $R_{3b}$ became $0$. You should notice that this is because the original rows $R_1$, $R_2$ and $R_3$ are linearly dependent. Indeed, \[0=R_{3b}=R_{3a}-\tfrac{4}{7}R_{2a}=(R_3-3R_1)-\tfrac{4}{7}(R_2-2R_1)=-\tfrac{13}{7}R_1-\tfrac{4}{7}R_2+R_3;\] i.e., $-13R_1-4R_2+7R_3=0$. This gives us the relationship between the three coefficient vectors of the original equations.

This didn’t happen in Example 2.3, as the coefficient vectors were linearly independent; there was no non-trivial relationship between the vectors, and hence between the equations.

Example 2.5 Solve \[\begin{array}{rcrcrcrcl} 2x&+&3y&-&2z&=&1&&(1)\\ 4x&+&6y&+&3z&=&2&&(2)\\ 6x&+&9y&-&2z&=&7&&(3) \end{array}\]

We translate the equations into matrix form, as the system \[(A|b)=\begin{array}{l}R_1\\{R}_2\\{R}_3\end{array} \left( \begin{array}{rrr|r} \fbox{2}&3&-2&1\\ 4&6&3&2\\ 6&9&-2&7 \end{array} \right)\] As before, we pivot on the first entry of row $R_1$ to get: \[\begin{array}{l} R_1\\{R}_{2a}=R_2-2R_1\\{R}_{3a}=R_3-3R_1 \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&0&\fbox{7}&0\\ 0&0&4&4 \end{array} \right)\] Continuing as in the last example, we get \[\begin{array}{l} R_1\\{R}_{2a}\\{R}_{3b}=R_{3a}-\tfrac{4}{7}R_{2a} \end{array} \left( \begin{array}{rrr|r} 2&3&-2&1\\ 0&0&7&0\\ 0&0&0&4 \end{array} \right)\] Now the last equation says $0x+0y+0z=4$, which obviously has no solution, and this means that there is no solution to the system of equations.

The problem here is that a row/equation has all of the coefficients equal to $0$, but the constant on the other side is non-zero, unlike the last example.

As in Example 2.4, the coefficient vectors here are linearly dependent, and this causes problems for the solution of the system.

Let’s return to an example above:

Example 2.6 Consider the three planes \[\begin{aligned} x+y+z&=2\\ 2x+3y-z&=1\\ x+4z&=5.\end{aligned}\] We claim that these intersect in a line, and that the normal vectors to the planes are linearly dependent.

To see that the three planes intersect in a line, we need to solve the equations simultaneously. The systematic way to do this is to use row reduction, with the augmented matrix \[\left(\begin{array}{ccc|c}\fbox{1}&1&1&2\\2&3&-1&1\\1&0&4&5\end{array}\right).\] We pivot on the top left entry, subtracting multiples of the top row from the other rows to make the first entries all vanish. We get \[\left(\begin{array}{ccc|c}1&1&1&2\\0&\fbox{1}&-3&-3\\0&-1&3&3\end{array}\right).\] Now we add the middle line to the bottom (this pivots on the middle entry of the array): \[\left(\begin{array}{ccc|c}1&1&1&2\\0&1&-3&-3\\0&0&0&0\end{array}\right).\] We see that we are reduced to just two equations, \[\begin{aligned} x+y+z&=2\\ y-3z&=-3.\end{aligned}\] We can put $z=t$, an arbitrary parameter, and read off $y=3t-3$, and then $x=5-4t$, to get the solution \[\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}5-4t\\3t-3\\t\end{pmatrix}=\begin{pmatrix}5\\-3\\0\end{pmatrix}+t\begin{pmatrix}-4\\3\\1\end{pmatrix},\] which is a straight line passing through $\begin{pmatrix}5\\-3\\0\end{pmatrix}$ in the direction $\begin{pmatrix}-4\\3\\1\end{pmatrix}$.

But the normal vectors to the planes are $\begin{pmatrix}1\\1\\1\end{pmatrix}$, $\begin{pmatrix}2\\3\\-1\end{pmatrix}$ and $\begin{pmatrix}1\\0\\4\end{pmatrix}$. Let’s see that these are linearly dependent. We need to try to solve \[\alpha\begin{pmatrix}1\\1\\1\end{pmatrix}+\beta\begin{pmatrix}2\\3\\-1\end{pmatrix}+\gamma\begin{pmatrix}1\\0\\4\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix}.\] So we need to solve \[\begin{aligned} \alpha+2\beta+\gamma&=0\\ \alpha+3\beta&=0\\ \alpha-\beta+4\gamma&=0,\end{aligned}\] and this means using the augmented matrix \[\left(\begin{array}{ccc|c}\fbox{1}&2&1&0\\1&3&0&0\\1&-1&4&0\end{array}\right).\] We can do this in the same way, pivoting on the top left entry: \[\left(\begin{array}{ccc|c}1&2&1&0\\0&\fbox{1}&-1&0\\0&-3&3&0\end{array}\right),\] and then on the middle entry to get \[\left(\begin{array}{ccc|c}1&2&1&0\\0&1&-1&0\\0&0&0&0\end{array}\right).\] So again there are really only two relations, \[\begin{aligned} \alpha+2\beta+\gamma&=0\\ \beta-\gamma&=0\end{aligned}\] and we see that $\alpha=3$, $\beta=1$, $\gamma=1$ gives a nontrivial relation. So the normal vectors are linearly dependent.

Notice that in this last calculation, if we start with $0$s in the constant column, then this won’t change under any of our manipulations, so we could simply omit the constant column.

Let’s do another example above to see linearly independence:

Example 2.7 Show that $\begin{pmatrix}1\\0\\1\end{pmatrix}$, $\begin{pmatrix}2\\1\\1\end{pmatrix}$ and $\begin{pmatrix}1\\1\\1\end{pmatrix}$ are linearly independent.

We need to solve \[\alpha\begin{pmatrix}1\\0\\1\end{pmatrix}+\beta\begin{pmatrix}2\\1\\1\end{pmatrix}+\gamma\begin{pmatrix}1\\1\\1\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix}.\]

Again, we consider the augmented matrix \[\left(\begin{array}{ccc|c}1&2&1&0\\0&1&1&0\\1&1&1&0\end{array}\right).\] As mentioned above, if the constants are all $0$, we don’t need the last column (but try to remember that it is secretly there!), and work with \[\begin{pmatrix}\fbox{1}&2&1\\0&1&1\\1&1&1\end{pmatrix}.\] Pivoting on the top left entry gives \[\begin{pmatrix}1&2&1\\0&\fbox{1}&1\\0&-1&0\end{pmatrix}.\] Then pivoting on the middle entry gives \[\begin{pmatrix}1&2&1\\0&1&1\\0&0&1\end{pmatrix}.\] The bottom line means $\gamma(=0)$, the previous one gives $\beta+\gamma(=0)$, and the first means $\alpha+2\beta+\gamma(=0)$, and this implies that $\alpha=\beta=\gamma=0$.

In general, given $n$ equations in $n$ unknowns, we have an $n\times n$-matrix of coefficients. If the rows of this matrix (i.e., the vectors of coefficients of the simultaneous equations) are linearly independent, then Gaussian elimination will reduce the system as in Example 2.3 to one which has a unique solution. If, however, the rows of the matrix are linearly dependent, this means that we can get the left-hand side of one equation by combining other equations, and either the equation is then redundant (meaning that there are infinitely many solutions) or contradictory (so that there are no solutions).

2.5 Complete elimination

Actually, we can do a bit more. In the examples so far, we’ve just used the operation of adding/subtracting multiples of one row from another, but there are other operations we might want to do.

Indeed, there are three natural operations we do when solving the system of equations.

We can exchange the order of the equations, for example, swapping two equations. This corresponds to exchanging two rows of the augmented matrix.
We can scale one of the equations by some non-zero constant. This corresponds to multiplying one row of the augmented matrix by a non-zero number.
We can add or subtract some multiple of one equation from another. This corresponds to adding or subtracting a multiple of one row of the augmented matrix to another row.

If we consider the three equations: \[\begin{array}{rcrcrcl} x&+&2y&+&3z&=&1\\ 2x&+&2y&+& z&=&4\\ x&-& y&+& z&=&4, \end{array}\] we can summarise the linear system by the augmented matrix \[\left( \begin{array}{ccc|c} \fbox{1}&2&3&1\\ 2&2&1&4\\ 1&-1&1&4 \end{array} \right).\] We would then eliminate $x$, by taking the second equation from twice the first to get \[2y+5z=-2,\] and then taking the third equation from the first to get \[3y+2z=-3.\] This corresponds to adding multiples of the first row to the other two, and we replace the second and third rows in the augmented matrix by what we get by these operations: \[\left( \begin{array}{ccc|c} 1&2&3&1\\ 0&\fbox{2}&5&-2\\ 0&3&2&-3 \end{array} \right).\] We could then do some elimination in the bottom two equations, multiplying the first equation by 3 and the second by 2 to get \[\begin{array}{rcrcl} 6y&+&15z&=&-6\\ 6y&+&4z&=&-6 \end{array}\] or \[\left( \begin{array}{ccc|c} 1&2&3&1\\ 0&6&15&-6\\ 0&6&4&-6 \end{array} \right).\] Here we have scaled rows by a constant (which corresponds to scaling equations by a constant).

Then we take the last equation from the previous one to get $11z=0$, or \[\left( \begin{array}{ccc|c} 1&2&3&1\\ 0&6&15&-6\\ 0&0&11&0 \end{array} \right).\] We then read off that $z=0$. Then we can substitute back into the two equations in two variables to read off that $y=-1$, and then substitute back into the three equations in three variables to get $x=3$.

Let’s just emphasise that our operations on equations correspond to these operations on the rows of the matrices.

What we want to do is to transform the augmented matrix into one where we can read off the solutions really easily!

Ideally, we’d like to get to the augmented matrix \[\left( \begin{array}{ccc|c} 1&0&0&?_1\\ 0&1&0&?_2\\ 0&0&1&?_3 \end{array} \right)\] since this corresponds to the equations $x=?_1$, $y=?_2$, $z=?_3$. In practice, if this can be done, the following procedure works:

Find a non-zero entry in the first column. Scale the row/equation so that this entry becomes $1$. Then add/subtract an appropriate multiple of that row/equation so that the other entries in that column all become $0$.
Find a non-zero entry in the second column which is not in the same row as the non-zero entry in the first column. Scale the row/equation so that this entry becomes $1$. Then add/subtract an appropriate multiple of that row/equation so that the other entries in that column all become $0$.
Find a non-zero entry in the third column which is not in the same row as the non-zero entry in the first or second column. Scale the row/equation so that this entry becomes $1$. Then add/subtract an appropriate multiple of that row/equation so that the other entries in that column all become $0$.

Let’s do this mechanically for the system above: \[\begin{aligned} &&\left( \begin{array}{ccc|c} 1&2&3&1\\ 2&2&1&4\\ 1&-1&1&4 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} \fbox{1}&2&3&1\\ 2&2&1&4\\ 1&-1&1&4 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} 1&2&3&1\\ 0&-2&-5&2\\ 0&-3&-2&3 \end{array} \right)\\ &\longrightarrow& \left( \begin{array}{ccc|c} 1&2&3&1\\ 0&\fbox{$-2$}&-5&2\\ 0&-3&-2&3 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} 1&2&3&1\\ 0&\fbox{1}&5/2&-1\\ 0&-3&-2&3 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} 1&0&-2&3\\ 0&1&5/2&-1\\ 0&0&11/2&0 \end{array} \right)\\ &\longrightarrow& \left( \begin{array}{ccc|c} 1&0&-2&3\\ 0&1&5/2&-1\\ 0&0&\fbox{11/2}&0 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} 1&0&-2&3\\ 0&1&5/2&-1\\ 0&0&\fbox{1}&0 \end{array} \right) \longrightarrow \left( \begin{array}{ccc|c} 1&0&0&3\\ 0&1&0&-1\\ 0&0&1&0 \end{array} \right).\end{aligned}\] Remember that this corresponds to a system of equations, but this system is just: \[\begin{array}{ccccr} x& & &=&3\\ &y& &=&-1\\ & &z&=&0, \end{array}\] so we can read off $x=3$, $y=-1$, $z=0$.

Generally, if we have a system of equations/matrix, such as \[\begin{pmatrix} 1&2&3&4&5\\ 0&0&2&1&0\\ 0&0&0&1&3\\ 0&0&0&0&0 \end{pmatrix},\] we can continue the reduction further by selecting the leading coefficient of each row in turn, scaling it until it is $1$, and adding/subtracting an appropriate multiple of the row so that all other entries in the same column as this leading entry are $0$. For the matrix above, we would continue as follows:

The first row already begins with a $1$, so no scaling is necessary, and the other entries in that column are all already $0$.
The second row has a $2$ as its first non-zero entry. So divide the row/equation by $2$ to get \[\begin{pmatrix} 1&2&3&4&5\\ 0&0&\fbox{1}&\tfrac{1}{2}&0\\ 0&0&0&1&3\\ 0&0&0&0&0 \end{pmatrix}.\] Now we note that there is a $3$ in the top row in the same column as the first entry of the second row. So we take $3$ times this new second row from the top row to make it $0$ (i.e., take $3$ times the second equation from the first): \[\begin{pmatrix} 1&2&0&\tfrac{5}{2}&5\\ 0&0&1&\tfrac{1}{2}&0\\ 0&0&0&\fbox{1}&3\\ 0&0&0&0&0 \end{pmatrix}.\]
The third row already starts with a $1$, so no scaling is needed. We need to take $5/2$ times this row from the top row, and $1/2$ of this row from the second row, to make all the other entries in the fourth column $0$: \[\begin{pmatrix} 1&2&0&0&-\tfrac{5}{2}\\ 0&0&1&0&-\tfrac{3}{2}\\ 0&0&0&1&3\\ 0&0&0&0&0 \end{pmatrix}.\]
The fourth row is all zero, so there is nothing more we can do.

The result is the reduced row echelon form of the original matrix. Note that if this corresponded to a set of equations, we would get: \[\begin{aligned} x+2y&=-5/2\\ z&=-3/2\\ w&=3.\end{aligned}\] We assign arbitrary constants to those columns which don’t have leading coefficients – so we put $y=\lambda$, say, and then the remaining terms are easily read off in terms of these parameters. In this case, the general solution is $(-5/2-2\lambda,\lambda,-3/2,3)$.

Here’s a more formal definition:

Definition 2.3 A matrix is in reduced row echelon form if

it is in row echelon form;
every leading coefficient is $1$, and is the only non-zero entry in its column.

The following matrix is in reduced row echelon form: \[\begin{pmatrix}1&0&a&0&b\\0&1&c&0&d\\0&0&0&1&e\end{pmatrix},\] but neither of the following are: \[\begin{pmatrix}1&0&a&0&b\\0&1&0&c&d\\0&0&0&0&1\end{pmatrix},\qquad\begin{pmatrix}1&0&a&b\\0&0&2&c\\0&0&0&1\end{pmatrix}.\] When solving equations using Gaussian elimination, we make a complete elimination of the augemented matrix. Each column corresponds to a variable, and the nicest way to present the answers is to make all the variables corresponding to a column which doesn’t contain a leading coefficient into parameters, and then the remaining variables will be simply expressed in terms of these parameters.

Example 2.8 Suppose that the result of Gaussian elimination was an augmented matrix \[\left(\begin{array}{ccccc|c}1&0&-1&0&2&2\\0&1&4&0&1&3\\0&0&0&1&2&1\end{array}\right),\] corresponding to equations \[\begin{array}{ccccccccccc} x&&&-&z&&&+&2u&=&2\\ &&y&+&4z&&&+&u&=&3\\ &&&&&&t&+&2u&=&1. \end{array}\] We look for the variables which do not occur as leading entries in any row. The top row has its leading entry being the first variable ($x$), the second row has its leading entry being the second variable ($y$), and the third row has its leading entry being the fourth variable ($t$).

So the variables $z$ and $u$ are not leading entries in any row.

We will make them parameters, perhaps $z=\lambda$, $u=\mu$. Then the equations are \[\begin{array}{ccccccccccc} x&&&-&\lambda&&&+&2\mu&=&2\\ &&y&+&4\lambda&&&+&\mu&=&3\\ &&&&&&t&+&2\mu&=&1. \end{array}\] We read off $x=2+\lambda-2\mu$, $y=3-4\lambda-\mu$, $t=1-2\mu$. So the complete solution is \[\begin{pmatrix}x\\y\\z\\t\\u\end{pmatrix}=\begin{pmatrix}2+\lambda-2\mu\\3-4\lambda-\mu\\\lambda\\1-2\mu\\\mu\end{pmatrix}=\begin{pmatrix}2\\3\\0\\1\\0\end{pmatrix}+\lambda\begin{pmatrix}1\\-4\\1\\0\\0\end{pmatrix}+\mu\begin{pmatrix}-2\\-1\\0\\-2\\1\end{pmatrix}.\] Notice that this means that there are two parameters in the solution, and that the solution is therefore somehow two-dimensional (recall the parametric form of a plane from the first chapter).

2.6 Matrices and linear equations

We have already “abbreviated” systems of linear equations by means of matrices. Actually, a little more is going on. We recall that the dot product of $(a~b~c)$ and $(x~y~z)$ is $ax+by+cz$, which is a typical left-hand side for a linear equation. If we make this the definition of the product $\begin{pmatrix}a&b&c\end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}$, then we can write the equation $ax+by+cz=d$ in the form \[\begin{pmatrix}a&b&c\end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}=d.\] Extending this, we can work with the whole matrix of coefficients, and define \[\begin{pmatrix}a_1&b_1&c_1\\a_2&b_2&c_2\\a_3&b_3&c_3\end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}a_1x+b_1y+c_1z\\a_2x+b_2y+c_2z\\a_3x+b_3y+c_3z\end{pmatrix},\] so that we get all the left-hand sides of equations.

Remark. It shouldn’t be obvious why we write the first vector as a row, and the second as a column. If not, try to develop a theory where both vectors are rows, or both are columns, and see if you can get a nice theory like we will get in the remainder of the course (I suspect you won’t, but see how far you can get!).

Example 2.9 Determine $A\mathbf{x}$ where \[A=\begin{pmatrix} 3&-1&5\\ 6&4&7\\ 2&-3&0 \end{pmatrix},\quad\mathbf{x}=\begin{pmatrix} x\\ y\\ z \end{pmatrix}.\]

With our multiplication rules above, we have \[A\mathbf{x}=\begin{pmatrix} 3&-1&5\\ 6&4&7\\ 2&-3&0 \end{pmatrix}\begin{pmatrix} x\\ y\\ z \end{pmatrix}=\begin{pmatrix} 3x-y+5z \\ 6x+4y+7z\\ 2x-3y \end{pmatrix}.\]

So this product gives a column vector of the sort that one might expect to see in a system of linear equations!

In particular, given a set of linear equations \[\begin{array}{rcrcrcrcl} 3x&-& y&+&5z&=&1&&(1)\\ 6x&+&4y&+&7z&=&-3&&(2)\\ 2x&-&3y& & &=&-5&&(3) \end{array}\] we can view this alternatively as a single matrix equation $A\mathbf{x}=\mathbf{b}$, where \[A=\begin{pmatrix} 3&-1&5\\ 6&4&7\\ 2&-3&0 \end{pmatrix},\quad\mathbf{x}=\begin{pmatrix} x\\y\\z\end{pmatrix},\quad\mathbf{b}=\begin{pmatrix}1\\-3\\-5\end{pmatrix}.\]

This reinterpretation of systems of linear equations as matrix equations will prove fruitful in what follows.