Untransforming

In which you learn how to undo all the transformations that you learned about earlier.

The mouse cursor skates over the flat window, knowing nothing about the complex 3D scene behind it. It operates in pixel space. If you wish to figure out what object the mouse is clicking on, you'll have to convert the 2D mouse position into one of the preceding 3D spaces.

Recall that a vertex passes through this chain of spaces:

model space ↓ world space ↓ eye space ↓ clip space ↓ normalized space ↓ pixel space

Generally a vertex moves from one space to another by way of a matrix multiplication. The only exception is the transition from clip space to normalized space, which is the result of the perspective divide. All together, this is the mathematical gauntlet through which you send each vertex:

$$ \begin{aligned} \mathbf{p}_\mathrm{world} &= \mathrm{worldFromModel} \times \mathbf{p}_\mathrm{model} \\ \mathbf{p}_\mathrm{eye} &= \mathrm{eyeFromWorld} \times \mathbf{p}_\mathrm{world} \\ \mathbf{p}_\mathrm{clip} &= \mathrm{clipFromEye} \times \mathbf{p}_\mathrm{eye} \\ \mathbf{p}_\mathrm{norm} &= \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\ \mathbf{p}_\mathrm{pixel} &= \mathrm{pixelFromNormalized} \times \mathbf{p}_\mathrm{norm} \\ \end{aligned} $$

To get the mouse from pixel space into one of the earlier spaces, you must work backward through these operations.

From Pixel Space to Normalized Space

The gauntlet above shows a matrix named \(\mathrm{pixelFromNormalized}\). You don't have that matrix in your renderers because WebGL takes care of that final transformation for you. The job of this matrix is to turn normalized coordinates in \([-1, 1]\) into pixel coordinates in \([0, \mathrm{width}]\) and \([0, \mathrm{height}]\).

The matrix is a scale and translation. It transforms the normalized coordinates into proportions in \([0, 1]\) and then applies the proportions to the viewport dimensions. This transformation can be written in terms of vector operations instead of matrices:

$$ \mathbf{p}_\mathrm{pixel} = \frac{\mathbf{p}_\mathrm{norm} + 1}{2} \times \begin{bmatrix}\mathrm{width} & \mathrm{height}\end{bmatrix} $$

On a mouse event, you have \(\mathbf{p}_\mathrm{pixel}\). You want to solve for \(\mathbf{p}_\mathrm{norm}\), so you undo the vector operations. Here's how you'd solve for the normalized x-coordinate:

$$ \begin{aligned} x_\mathrm{pixel} &= \frac{x_\mathrm{norm} + 1}{2} \times \mathrm{width} \\ \frac{x_\mathrm{pixel}}{\mathrm{width}} &= \frac{x_\mathrm{norm} + 1}{2} \\ 2 \times \frac{x_\mathrm{pixel}}{\mathrm{width}} &= x_\mathrm{norm} + 1 \\ 2 \times \frac{x_\mathrm{pixel}}{\mathrm{width}} - 1 &= x_\mathrm{norm} \end{aligned} $$

The y-coordinate is computed similarly.

Soon you'll read about an input device that operates in normalized space. Since it doesn't need to visit any early spaces, its work of untransforming the mouse position stops at this stage.

From Normalized Space to Clip Space

Sometimes you need to go farther back in the transformation pipeline. Perhaps you are trying to move a vertex on a 3D model to a new position in model space. Maybe you are trying to place a building or character in world space. To get farther back, surely you must undo the perspective divide, which divided the clip space position by its homogeneous coordinate:

$$ \mathbf{p}_\mathrm{norm} = \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\ $$

Solve for \(\mathbf{p}_\mathrm{clip}\) to undo the perspective divide:

$$ \mathbf{p}_\mathrm{norm} \times w_\mathrm{clip} = \mathbf{p}_\mathrm{clip} \\ $$

This is problematic. You don't have \(w_\mathrm{clip}\). If you had the clip space position, you wouldn't be trying to solve for it. To get around this circular dependency, you momentarily pretend that clip space doesn't exist. You will untransform from normalized space directly into eye space.

From Normalized Space to Eye, World, or Model Space

Untransforming into eye, world, or model space is a matter of undoing the matrix multiplications that got you into the space you're in. But what does it mean to undo a matrix multiplication?

To undo a scalar multiplication, you perform a scalar division. To undo a matrix multiplication, you perform another matrix multiplication—which seems odd. The matrix by which you multiply must apply the opposite transformation.

A translation matrix adds an offset to a position vector and has this form:

$$ \mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ $$

The opposite or inverse of \(\mathbf{T}\) subtracts the offset and therefore has this form:

$$ \mathbf{T}^{-1} = \begin{bmatrix} 1 & 0 & 0 & -\textrm{offset}_x \\ 0 & 1 & 0 & -\textrm{offset}_y \\ 0 & 0 & 1 & -\textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ $$

A scale matrix multiplies by scale factors and has this form:

$$ \mathbf{S} = \begin{bmatrix} \textrm{factor}_x & 0 & 0 & 0 \\ 0 & \textrm{factor}_y & 0 & 0 \\ 0 & 0 & \textrm{factor}_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The inverse of \(\mathbf{S}\) effectively divides by the scale factors and has this form:

$$ \mathbf{S}^{-1} = \begin{bmatrix} \frac{1}{\textrm{factor}_x} & 0 & 0 & 0 \\ 0 & \frac{1}{\textrm{factor}_y} & 0 & 0 \\ 0 & 0 & \frac{1}{\textrm{factor}_z} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ $$

As for rotation, you've seen four different rotation matrices at this point. If you know the axis and angle that were used to build the matrix, you can build the inverse in the exact same way but using the negated angle. For example, to undo a rotation of 45 degrees around the x-axis, you perform a rotation of -45 degrees around the x-axis.

If you don't know the axis or angle, you can make use of this handy mathematical truth: the inverse of a rotation matrix is its own transpose. You transpose a matrix by flipping its components about the diagonal running from its top-left to its bottom-right. The rows of the original matrix are the columns of the transpose. For example, suppose you have this rotation matrix:

$$ \mathbf{R} = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Its transpose is flipped about the diagonal:

$$ \mathbf{R}^{T} = \mathbf{R}^{-1} = \begin{bmatrix} a & d & g & 0 \\ b & e & h & 0 \\ c & f & i & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The elements on the diagonal do not move.

Knowing the inverses of the three standard transformations is of limited use. Normally you are applying a complex chain of translations, scales, and rotations, not just a single transformation. Consider this chain of three transformations:

$$ \mathbf{p}' = \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p} $$

Suppose you have \(\mathbf{p'}\) and want to solve for \(\mathbf{p}\). You start chipping away at the right-hand side by multiply both sides by the inverse of \(\mathbf{A}\):

$$ \begin{aligned} \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{A}^{-1} \times \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \end{aligned} $$

Multiplying a matrix by its inverse cancels it out. After applying inverses a few more times, you arrive at \(\mathbf{p}\):

$$ \begin{aligned} \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{C} \times \mathbf{p} \\ \mathbf{C}^{-1} \times \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{p} \\ \end{aligned} $$

This shows that if you know \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\), then you may compute the inverse of their product by computing the product of their inverses in reversed order. However, you probably don't have your matrices broken down into separate transformations like this. You've been accumulating them up into a single matrix.

What you really need is a magic method that that will invert any invertible matrix without prior knowledge of how it was constructed. Here is that method, offered without explanation:

class Matrix4 {
  // ...

  inverse() {
    let m = new Matrix4();

    let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
    let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
    let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);

    let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
    let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
    let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);

    let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
    let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
    let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);

    let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
    let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
    let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);

    let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;

    if (determinant != 0) {
      let inverseDeterminant = 1 / determinant;
      m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
      m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
      m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
      m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
      m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
      m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
      m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
      m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
      m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
      m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
      m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
      m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
      m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
      m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
      m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
      m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
    } else {
      throw Error('Matrix is singular.');
    }

    return m;
  }
}

Add this method to your Matrix4 class.

The big takeaway is that you can turn positions from a later space into an earlier space by multiplying the coordinates in the later space by the matrix inverses. If you have a normalized space position, you can turn it into an eye, world, or model space position with this JavaScript:

const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();

const normalizedPosition = ...;
const eyePosition = eyeFromClip.multiplyMatrix(normalizedPosition);
const worldPosition = worldFromEye.multiplyMatrix(eyePosition);
const modelPosition = modelFromWorld.multiplyMatrix(worldPosition);

Observe how the spaces in the names of the matrices flip when inverting them.