*In which you learn how the GPU can compute four dot products at once by forging vectors into a matrix.*

You rewrote the transformations as dot products because GPUs eat dot products for breakfast. In fact, the GPU can compute several dot products at once. For translation, you want the GPU to compute these three dot products:

$$ \begin{aligned} p'_x &= \begin{bmatrix}1 & 0 & 0 & \textrm{offset}_x\end{bmatrix} \cdot \mathbf{p} \\ p'_y &= \begin{bmatrix}0 & 1 & 0 & \textrm{offset}_y\end{bmatrix} \cdot \mathbf{p} \\ p'_z &= \begin{bmatrix}0 & 0 & 1 & \textrm{offset}_z\end{bmatrix} \cdot \mathbf{p} \end{aligned} $$

The way you parallelize their evaluation is by arranging the vectors into a matrix:

\begin{aligned} \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \end{bmatrix} \end{aligned}

For the time being, you should think of a matrix as nothing more than a machine for computing dot products. Each row represents a vector that will be dotted with \(\mathbf{p}\).

To compute the dot products, you perform a matrix-vector multiplication, which has this mathematical notation:

\begin{aligned} \mathbf{p'} &= \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \end{bmatrix} \times \mathbf{p} \\ \end{aligned}

When you explode the vectors, you see that \(\mathbf{p}\) has a homogeneous coordinate, but \(\mathbf{p'}\) does not:

\begin{aligned} \begin{bmatrix}p'_x \\ p'_y \\ p'_z\end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \end{bmatrix} \times \begin{bmatrix}p_x \\ p_y \\ p_z \\ 1\end{bmatrix} \\ \end{aligned}

The output from one transformation will often become input to the next, so you want \(\mathbf{p'}\) to have a homogeneous coordinate too. Instead of tacking a 1 on in a separate operation, the matrix is given a bottom row that will be dotted with \(\mathbf{p}\) to produce a 1. It will have three zeroes to cancel out \(p_x\), \(p_y\), \(p_z\):

\begin{aligned} \begin{bmatrix}p'_x \\ p'_y \\ p'_z \\ 1\end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix}p_x \\ p_y \\ p_z \\ 1\end{bmatrix} \\ \end{aligned}

Even though we're dealing with three-dimensional coordinates, you use a matrix with four rows and four columns just to accommodate the homogeneous coordinate.

You now have a way to make the GPU translate very quickly. Next up is making it scale quickly.