Why is the rule as it is for inverting 2×2 matrices?

The inverse of the 2×2 matrix

a b
c d


d −b
−c a

divided by the determinant, ad−bc

I’ll try to give a rough idea why. See also: Why are there minuses in calculating determinants?.

The dividing by the determinant is logical enough. The determinant of a matrix is its magnifying factor for areas, and the inverse must reverse that magnification.

Roughly speaking, the elements on the non-leading diagonal are minus’d to reverse the matrix’s turning effect, and the elements on the leading diagonal are swapped to reverse its scaling effect.

The elements on the non-leading diagonal tell us about the matrix turning things. (If there’s no turning, they’re zero). And to reverse a turn of size x, we need a similar turn of size −x.

If the matrix shears by 1 to the right, then it is

1   1
0   1

The inverse of shearing by 1 to the right is to shear by −1 to the right, so the inverse must be

1  −1
0   1


1   0
1   1

shears by 1 upwards, and the inverse is shearing by −1 upwards.

1   0
−1  1

(We look at shearings rather than rotations, because all 2-D rotations can be produced by shearings and every 2-D linear transformation can be produced by multiplying scalings along the x and y axes, and shearings parallel to those axes.

The elements on the leading diagonal tell us a lot about the matrix scaling things. If the matrix just scales up 2× along the x-direction, and 3× along the y-direction, it is

2 0
0 3

and the inverse must be

½ 0
0 ⅓

which is

3 0
0 2

divided by the determinant, i.e. 6.

So swapping the elements on the leading diagonal reverses the scaling, or at least helps reverse the scaling. (When we invert we divide the non-leading diagonal elements by the determinant as well as minus’ing them, and that also helps to reverse scaling).

There’s a neat way of seeing why the rule works which unfortunately requires an amount of matrix algebra which you’re unlikely to learn until the rule is already stuck in your memory.

By the Cayley-Hamilton theorem, the matrix A satisfies its characteristic equation. That is, if you take the quadratic equation in λ defined by

det (A−λI)=0

or (a−λ)(d−λ)−bc=0

then the quadratic in A you get by substituting A for λ also equals zero.

Let t=the trace of A, the total of its leading diagonal, or a+d

Then the characteristic equation is:


Substitute A for λ


Right-multiply by A−1